In this document, we create the queries and visualizations that drive our reporting of results.

Load in Data

This is the data we used to fit the models.

# read in data 
model_df <- read_csv("model-data.csv")
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   worker_id = col_character(),
##   condition = col_character(),
##   start_means = col_logical(),
##   gender = col_character(),
##   age = col_character(),
##   education = col_character(),
##   chart_use = col_character(),
##   strategy_with_means = col_character(),
##   strategy_without_means = col_character(),
##   outcome = col_logical(),
##   means = col_logical(),
##   exclude = col_logical()
## )
## See spec(...) for full column specifications.
# preprocessing
model_df <- model_df %>% 
  mutate(
    # factors for modeling
    means = as.factor(means),
    start_means = as.factor(start_means),
    sd_diff = as.factor(sd_diff),
    condition = factor(condition, levels = c("densities","intervals", "HOPs", "QDPs")), # reorder
    # evidence scale for decision model
    p_diff = p_award_with - (p_award_without + (1 / award_value)),
    evidence = qlogis(p_award_with) - qlogis(p_award_without + (1 / award_value))
  )

Probability of Superiority

We load in the model of probability of superiority judgments that we arrived at through a process of model expansion described in our preregistration[https://osf.io/9kpmb]. This is basically a hierachical linear model of probability of superiority judgments where both judgments and the ground truth have been transformed onto a log odds scale, making this a linear in log odds (LLO) model. See the paper and experiment/analysis/PSuperiority.Rmd in the supplemental materials for details.

# hierarchical linear log odds model
m.p_sup <- brm(data = model_df, family = "gaussian",
             formula = bf(lo_p_sup ~  (1 + lo_ground_truth*trial + means*sd_diff|worker_id) + lo_ground_truth*means*sd_diff*condition*start_means + lo_ground_truth*condition*trial,
                          sigma ~ (1 + lo_ground_truth + trial|worker_id) + lo_ground_truth*condition*trial + means*start_means),
             prior = c(prior(normal(1, 0.5), class = b),
                       prior(normal(1.3, 1), class = Intercept),
                       prior(normal(0, 0.15), class = sd, group = worker_id),
                       prior(normal(0, 0.3), class = b, dpar = sigma),
                       prior(normal(0, 0.15), class = sd, dpar = sigma),
                       prior(lkj(4), class = cor)),
             iter = 12000, warmup = 2000, chains = 2, cores = 2, thin = 2,
             control = list(adapt_delta = 0.99, max_treedepth = 12),
             file = "model-fits/llo_mdl-min-r_means_sd_trial_block_sigma_gt_trial_means_block")
summary(m.p_sup)
##  Family: gaussian 
##   Links: mu = identity; sigma = log 
## Formula: lo_p_sup ~ (1 + lo_ground_truth * trial + means * sd_diff | worker_id) + lo_ground_truth * means * sd_diff * condition * start_means + lo_ground_truth * condition * trial 
##          sigma ~ (1 + lo_ground_truth + trial | worker_id) + lo_ground_truth * condition * trial + means * start_means
##    Data: model_df (Number of observations: 19892) 
## Samples: 2 chains, each with iter = 12000; warmup = 2000; thin = 2;
##          total post-warmup samples = 10000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 622) 
##                                                Estimate Est.Error l-95% CI
## sd(Intercept)                                      0.06      0.01     0.05
## sd(lo_ground_truth)                                0.39      0.01     0.37
## sd(trial)                                          0.03      0.01     0.00
## sd(meansTRUE)                                      0.03      0.01     0.02
## sd(sd_diff15)                                      0.08      0.01     0.07
## sd(lo_ground_truth:trial)                          0.24      0.02     0.21
## sd(meansTRUE:sd_diff15)                            0.06      0.01     0.04
## sd(sigma_Intercept)                                1.18      0.03     1.12
## sd(sigma_lo_ground_truth)                          0.41      0.01     0.38
## sd(sigma_trial)                                    1.18      0.04     1.11
## cor(Intercept,lo_ground_truth)                    -0.47      0.09    -0.64
## cor(Intercept,trial)                               0.20      0.23    -0.30
## cor(lo_ground_truth,trial)                        -0.24      0.23    -0.64
## cor(Intercept,meansTRUE)                           0.03      0.19    -0.32
## cor(lo_ground_truth,meansTRUE)                    -0.60      0.13    -0.81
## cor(trial,meansTRUE)                               0.19      0.24    -0.33
## cor(Intercept,sd_diff15)                          -0.01      0.11    -0.22
## cor(lo_ground_truth,sd_diff15)                     0.03      0.09    -0.15
## cor(trial,sd_diff15)                               0.01      0.22    -0.44
## cor(meansTRUE,sd_diff15)                           0.01      0.17    -0.33
## cor(Intercept,lo_ground_truth:trial)              -0.28      0.10    -0.46
## cor(lo_ground_truth,lo_ground_truth:trial)         0.41      0.06     0.29
## cor(trial,lo_ground_truth:trial)                  -0.34      0.24    -0.71
## cor(meansTRUE,lo_ground_truth:trial)              -0.14      0.16    -0.44
## cor(sd_diff15,lo_ground_truth:trial)               0.07      0.09    -0.10
## cor(Intercept,meansTRUE:sd_diff15)                -0.33      0.13    -0.59
## cor(lo_ground_truth,meansTRUE:sd_diff15)           0.23      0.13    -0.04
## cor(trial,meansTRUE:sd_diff15)                     0.16      0.23    -0.32
## cor(meansTRUE,meansTRUE:sd_diff15)                 0.03      0.19    -0.33
## cor(sd_diff15,meansTRUE:sd_diff15)                -0.30      0.12    -0.52
## cor(lo_ground_truth:trial,meansTRUE:sd_diff15)    -0.12      0.12    -0.36
## cor(sigma_Intercept,sigma_lo_ground_truth)        -0.71      0.02    -0.75
## cor(sigma_Intercept,sigma_trial)                   0.10      0.04     0.02
## cor(sigma_lo_ground_truth,sigma_trial)            -0.05      0.04    -0.13
##                                                u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)                                      0.07 1.00     4599     6863
## sd(lo_ground_truth)                                0.42 1.00     2957     6210
## sd(trial)                                          0.06 1.00     1202     2432
## sd(meansTRUE)                                      0.04 1.00     1557     2756
## sd(sd_diff15)                                      0.09 1.00     4960     7384
## sd(lo_ground_truth:trial)                          0.27 1.00     2197     5227
## sd(meansTRUE:sd_diff15)                            0.07 1.00     4550     7284
## sd(sigma_Intercept)                                1.25 1.00     3965     6120
## sd(sigma_lo_ground_truth)                          0.43 1.00     5222     7056
## sd(sigma_trial)                                    1.26 1.00     6867     8631
## cor(Intercept,lo_ground_truth)                    -0.28 1.00      465     1148
## cor(Intercept,trial)                               0.60 1.00     6938     8874
## cor(lo_ground_truth,trial)                         0.28 1.00     5378     6303
## cor(Intercept,meansTRUE)                           0.41 1.00     2445     4981
## cor(lo_ground_truth,meansTRUE)                    -0.30 1.00     2912     6200
## cor(trial,meansTRUE)                               0.62 1.00     2342     4018
## cor(Intercept,sd_diff15)                           0.21 1.00     3239     5650
## cor(lo_ground_truth,sd_diff15)                     0.20 1.00     4570     8400
## cor(trial,sd_diff15)                               0.44 1.00      458      802
## cor(meansTRUE,sd_diff15)                           0.32 1.00      836     1725
## cor(Intercept,lo_ground_truth:trial)              -0.08 1.00     1236     3298
## cor(lo_ground_truth,lo_ground_truth:trial)         0.53 1.00     6874     8129
## cor(trial,lo_ground_truth:trial)                   0.21 1.00      420      785
## cor(meansTRUE,lo_ground_truth:trial)               0.19 1.00      699     1635
## cor(sd_diff15,lo_ground_truth:trial)               0.23 1.00     3400     6099
## cor(Intercept,meansTRUE:sd_diff15)                -0.06 1.00     4618     7518
## cor(lo_ground_truth,meansTRUE:sd_diff15)           0.48 1.00     5079     8054
## cor(trial,meansTRUE:sd_diff15)                     0.58 1.00     1129     2019
## cor(meansTRUE,meansTRUE:sd_diff15)                 0.41 1.00     2273     4918
## cor(sd_diff15,meansTRUE:sd_diff15)                -0.05 1.00     4774     6954
## cor(lo_ground_truth:trial,meansTRUE:sd_diff15)     0.12 1.00     3833     8074
## cor(sigma_Intercept,sigma_lo_ground_truth)        -0.67 1.00     5963     8590
## cor(sigma_Intercept,sigma_trial)                   0.17 1.00     6192     7482
## cor(sigma_lo_ground_truth,sigma_trial)             0.04 1.00     4770     6838
## 
## Population-Level Effects: 
##                                                                        Estimate
## Intercept                                                                 -0.02
## sigma_Intercept                                                           -1.72
## lo_ground_truth                                                            0.46
## meansTRUE                                                                 -0.01
## sd_diff15                                                                  0.04
## conditionHOPs                                                             -0.09
## conditionintervals                                                        -0.01
## conditionQDPs                                                              0.02
## start_meansTRUE                                                            0.01
## trial                                                                     -0.05
## lo_ground_truth:meansTRUE                                                 -0.04
## lo_ground_truth:sd_diff15                                                  0.08
## meansTRUE:sd_diff15                                                        0.02
## lo_ground_truth:conditionHOPs                                             -0.01
## lo_ground_truth:conditionintervals                                        -0.10
## lo_ground_truth:conditionQDPs                                              0.07
## meansTRUE:conditionHOPs                                                    0.09
## meansTRUE:conditionintervals                                               0.02
## meansTRUE:conditionQDPs                                                   -0.02
## sd_diff15:conditionHOPs                                                    0.03
## sd_diff15:conditionintervals                                               0.02
## sd_diff15:conditionQDPs                                                   -0.01
## lo_ground_truth:start_meansTRUE                                           -0.14
## meansTRUE:start_meansTRUE                                                 -0.01
## sd_diff15:start_meansTRUE                                                  0.01
## conditionHOPs:start_meansTRUE                                              0.08
## conditionintervals:start_meansTRUE                                         0.00
## conditionQDPs:start_meansTRUE                                             -0.01
## lo_ground_truth:trial                                                      0.12
## conditionHOPs:trial                                                        0.01
## conditionintervals:trial                                                   0.03
## conditionQDPs:trial                                                        0.05
## lo_ground_truth:meansTRUE:sd_diff15                                        0.04
## lo_ground_truth:meansTRUE:conditionHOPs                                   -0.08
## lo_ground_truth:meansTRUE:conditionintervals                              -0.01
## lo_ground_truth:meansTRUE:conditionQDPs                                   -0.01
## lo_ground_truth:sd_diff15:conditionHOPs                                    0.05
## lo_ground_truth:sd_diff15:conditionintervals                              -0.01
## lo_ground_truth:sd_diff15:conditionQDPs                                    0.02
## meansTRUE:sd_diff15:conditionHOPs                                         -0.01
## meansTRUE:sd_diff15:conditionintervals                                    -0.02
## meansTRUE:sd_diff15:conditionQDPs                                         -0.00
## lo_ground_truth:meansTRUE:start_meansTRUE                                  0.04
## lo_ground_truth:sd_diff15:start_meansTRUE                                  0.02
## meansTRUE:sd_diff15:start_meansTRUE                                       -0.02
## lo_ground_truth:conditionHOPs:start_meansTRUE                             -0.07
## lo_ground_truth:conditionintervals:start_meansTRUE                         0.04
## lo_ground_truth:conditionQDPs:start_meansTRUE                              0.14
## meansTRUE:conditionHOPs:start_meansTRUE                                   -0.09
## meansTRUE:conditionintervals:start_meansTRUE                               0.01
## meansTRUE:conditionQDPs:start_meansTRUE                                    0.02
## sd_diff15:conditionHOPs:start_meansTRUE                                   -0.02
## sd_diff15:conditionintervals:start_meansTRUE                              -0.01
## sd_diff15:conditionQDPs:start_meansTRUE                                   -0.02
## lo_ground_truth:conditionHOPs:trial                                       -0.02
## lo_ground_truth:conditionintervals:trial                                   0.01
## lo_ground_truth:conditionQDPs:trial                                        0.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs                         -0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals                     0.03
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs                         -0.03
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE                        0.04
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE                    0.12
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE               0.03
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE                   -0.01
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE                    0.02
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE               0.01
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE                   -0.01
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                          0.05
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE                     0.03
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                          0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE         -0.08
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE    -0.05
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE         -0.01
## sigma_lo_ground_truth                                                      0.46
## sigma_conditionHOPs                                                        0.59
## sigma_conditionintervals                                                   0.17
## sigma_conditionQDPs                                                       -0.05
## sigma_trial                                                               -0.46
## sigma_meansTRUE                                                           -0.00
## sigma_start_meansTRUE                                                     -0.04
## sigma_lo_ground_truth:conditionHOPs                                       -0.18
## sigma_lo_ground_truth:conditionintervals                                  -0.10
## sigma_lo_ground_truth:conditionQDPs                                       -0.03
## sigma_lo_ground_truth:trial                                                0.02
## sigma_conditionHOPs:trial                                                  0.08
## sigma_conditionintervals:trial                                             0.14
## sigma_conditionQDPs:trial                                                 -0.03
## sigma_meansTRUE:start_meansTRUE                                           -0.22
## sigma_lo_ground_truth:conditionHOPs:trial                                  0.04
## sigma_lo_ground_truth:conditionintervals:trial                             0.06
## sigma_lo_ground_truth:conditionQDPs:trial                                 -0.02
##                                                                        Est.Error
## Intercept                                                                   0.02
## sigma_Intercept                                                             0.09
## lo_ground_truth                                                             0.04
## meansTRUE                                                                   0.02
## sd_diff15                                                                   0.02
## conditionHOPs                                                               0.03
## conditionintervals                                                          0.02
## conditionQDPs                                                               0.02
## start_meansTRUE                                                             0.02
## trial                                                                       0.02
## lo_ground_truth:meansTRUE                                                   0.02
## lo_ground_truth:sd_diff15                                                   0.02
## meansTRUE:sd_diff15                                                         0.02
## lo_ground_truth:conditionHOPs                                               0.07
## lo_ground_truth:conditionintervals                                          0.06
## lo_ground_truth:conditionQDPs                                               0.06
## meansTRUE:conditionHOPs                                                     0.03
## meansTRUE:conditionintervals                                                0.02
## meansTRUE:conditionQDPs                                                     0.03
## sd_diff15:conditionHOPs                                                     0.04
## sd_diff15:conditionintervals                                                0.03
## sd_diff15:conditionQDPs                                                     0.03
## lo_ground_truth:start_meansTRUE                                             0.06
## meansTRUE:start_meansTRUE                                                   0.03
## sd_diff15:start_meansTRUE                                                   0.03
## conditionHOPs:start_meansTRUE                                               0.04
## conditionintervals:start_meansTRUE                                          0.03
## conditionQDPs:start_meansTRUE                                               0.03
## lo_ground_truth:trial                                                       0.03
## conditionHOPs:trial                                                         0.04
## conditionintervals:trial                                                    0.03
## conditionQDPs:trial                                                         0.03
## lo_ground_truth:meansTRUE:sd_diff15                                         0.02
## lo_ground_truth:meansTRUE:conditionHOPs                                     0.04
## lo_ground_truth:meansTRUE:conditionintervals                                0.03
## lo_ground_truth:meansTRUE:conditionQDPs                                     0.03
## lo_ground_truth:sd_diff15:conditionHOPs                                     0.03
## lo_ground_truth:sd_diff15:conditionintervals                                0.02
## lo_ground_truth:sd_diff15:conditionQDPs                                     0.03
## meansTRUE:sd_diff15:conditionHOPs                                           0.04
## meansTRUE:sd_diff15:conditionintervals                                      0.03
## meansTRUE:sd_diff15:conditionQDPs                                           0.03
## lo_ground_truth:meansTRUE:start_meansTRUE                                   0.03
## lo_ground_truth:sd_diff15:start_meansTRUE                                   0.02
## meansTRUE:sd_diff15:start_meansTRUE                                         0.03
## lo_ground_truth:conditionHOPs:start_meansTRUE                               0.09
## lo_ground_truth:conditionintervals:start_meansTRUE                          0.09
## lo_ground_truth:conditionQDPs:start_meansTRUE                               0.09
## meansTRUE:conditionHOPs:start_meansTRUE                                     0.05
## meansTRUE:conditionintervals:start_meansTRUE                                0.04
## meansTRUE:conditionQDPs:start_meansTRUE                                     0.04
## sd_diff15:conditionHOPs:start_meansTRUE                                     0.05
## sd_diff15:conditionintervals:start_meansTRUE                                0.04
## sd_diff15:conditionQDPs:start_meansTRUE                                     0.04
## lo_ground_truth:conditionHOPs:trial                                         0.05
## lo_ground_truth:conditionintervals:trial                                    0.04
## lo_ground_truth:conditionQDPs:trial                                         0.05
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs                           0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals                      0.03
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs                           0.03
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE                         0.03
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE                     0.05
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE                0.04
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE                     0.04
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE                     0.04
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE                0.03
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE                     0.03
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                           0.05
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE                      0.04
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                           0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE           0.05
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE      0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE           0.04
## sigma_lo_ground_truth                                                       0.03
## sigma_conditionHOPs                                                         0.12
## sigma_conditionintervals                                                    0.12
## sigma_conditionQDPs                                                         0.12
## sigma_trial                                                                 0.10
## sigma_meansTRUE                                                             0.03
## sigma_start_meansTRUE                                                       0.07
## sigma_lo_ground_truth:conditionHOPs                                         0.05
## sigma_lo_ground_truth:conditionintervals                                    0.05
## sigma_lo_ground_truth:conditionQDPs                                         0.05
## sigma_lo_ground_truth:trial                                                 0.05
## sigma_conditionHOPs:trial                                                   0.14
## sigma_conditionintervals:trial                                              0.14
## sigma_conditionQDPs:trial                                                   0.14
## sigma_meansTRUE:start_meansTRUE                                             0.05
## sigma_lo_ground_truth:conditionHOPs:trial                                   0.07
## sigma_lo_ground_truth:conditionintervals:trial                              0.07
## sigma_lo_ground_truth:conditionQDPs:trial                                   0.07
##                                                                        l-95% CI
## Intercept                                                                 -0.05
## sigma_Intercept                                                           -1.90
## lo_ground_truth                                                            0.37
## meansTRUE                                                                 -0.04
## sd_diff15                                                                 -0.00
## conditionHOPs                                                             -0.14
## conditionintervals                                                        -0.05
## conditionQDPs                                                             -0.02
## start_meansTRUE                                                           -0.03
## trial                                                                     -0.10
## lo_ground_truth:meansTRUE                                                 -0.08
## lo_ground_truth:sd_diff15                                                  0.04
## meansTRUE:sd_diff15                                                       -0.02
## lo_ground_truth:conditionHOPs                                             -0.14
## lo_ground_truth:conditionintervals                                        -0.23
## lo_ground_truth:conditionQDPs                                             -0.06
## meansTRUE:conditionHOPs                                                    0.02
## meansTRUE:conditionintervals                                              -0.03
## meansTRUE:conditionQDPs                                                   -0.07
## sd_diff15:conditionHOPs                                                   -0.04
## sd_diff15:conditionintervals                                              -0.03
## sd_diff15:conditionQDPs                                                   -0.06
## lo_ground_truth:start_meansTRUE                                           -0.26
## meansTRUE:start_meansTRUE                                                 -0.07
## sd_diff15:start_meansTRUE                                                 -0.04
## conditionHOPs:start_meansTRUE                                              0.01
## conditionintervals:start_meansTRUE                                        -0.05
## conditionQDPs:start_meansTRUE                                             -0.07
## lo_ground_truth:trial                                                      0.06
## conditionHOPs:trial                                                       -0.07
## conditionintervals:trial                                                  -0.02
## conditionQDPs:trial                                                       -0.01
## lo_ground_truth:meansTRUE:sd_diff15                                       -0.01
## lo_ground_truth:meansTRUE:conditionHOPs                                   -0.15
## lo_ground_truth:meansTRUE:conditionintervals                              -0.07
## lo_ground_truth:meansTRUE:conditionQDPs                                   -0.06
## lo_ground_truth:sd_diff15:conditionHOPs                                   -0.01
## lo_ground_truth:sd_diff15:conditionintervals                              -0.06
## lo_ground_truth:sd_diff15:conditionQDPs                                   -0.03
## meansTRUE:sd_diff15:conditionHOPs                                         -0.09
## meansTRUE:sd_diff15:conditionintervals                                    -0.08
## meansTRUE:sd_diff15:conditionQDPs                                         -0.06
## lo_ground_truth:meansTRUE:start_meansTRUE                                 -0.02
## lo_ground_truth:sd_diff15:start_meansTRUE                                 -0.02
## meansTRUE:sd_diff15:start_meansTRUE                                       -0.07
## lo_ground_truth:conditionHOPs:start_meansTRUE                             -0.25
## lo_ground_truth:conditionintervals:start_meansTRUE                        -0.13
## lo_ground_truth:conditionQDPs:start_meansTRUE                             -0.03
## meansTRUE:conditionHOPs:start_meansTRUE                                   -0.19
## meansTRUE:conditionintervals:start_meansTRUE                              -0.07
## meansTRUE:conditionQDPs:start_meansTRUE                                   -0.06
## sd_diff15:conditionHOPs:start_meansTRUE                                   -0.11
## sd_diff15:conditionintervals:start_meansTRUE                              -0.08
## sd_diff15:conditionQDPs:start_meansTRUE                                   -0.09
## lo_ground_truth:conditionHOPs:trial                                       -0.12
## lo_ground_truth:conditionintervals:trial                                  -0.08
## lo_ground_truth:conditionQDPs:trial                                       -0.09
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs                         -0.09
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals                    -0.03
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs                         -0.09
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE                       -0.01
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE                    0.02
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE              -0.05
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE                   -0.08
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE                   -0.05
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE              -0.04
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE                   -0.07
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                         -0.06
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE                    -0.05
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                         -0.06
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE         -0.17
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE    -0.12
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE         -0.08
## sigma_lo_ground_truth                                                      0.39
## sigma_conditionHOPs                                                        0.36
## sigma_conditionintervals                                                  -0.07
## sigma_conditionQDPs                                                       -0.29
## sigma_trial                                                               -0.67
## sigma_meansTRUE                                                           -0.06
## sigma_start_meansTRUE                                                     -0.18
## sigma_lo_ground_truth:conditionHOPs                                       -0.27
## sigma_lo_ground_truth:conditionintervals                                  -0.20
## sigma_lo_ground_truth:conditionQDPs                                       -0.12
## sigma_lo_ground_truth:trial                                               -0.07
## sigma_conditionHOPs:trial                                                 -0.19
## sigma_conditionintervals:trial                                            -0.14
## sigma_conditionQDPs:trial                                                 -0.31
## sigma_meansTRUE:start_meansTRUE                                           -0.32
## sigma_lo_ground_truth:conditionHOPs:trial                                 -0.09
## sigma_lo_ground_truth:conditionintervals:trial                            -0.08
## sigma_lo_ground_truth:conditionQDPs:trial                                 -0.16
##                                                                        u-95% CI
## Intercept                                                                  0.01
## sigma_Intercept                                                           -1.54
## lo_ground_truth                                                            0.54
## meansTRUE                                                                  0.03
## sd_diff15                                                                  0.08
## conditionHOPs                                                             -0.03
## conditionintervals                                                         0.03
## conditionQDPs                                                              0.06
## start_meansTRUE                                                            0.05
## trial                                                                     -0.01
## lo_ground_truth:meansTRUE                                                 -0.00
## lo_ground_truth:sd_diff15                                                  0.12
## meansTRUE:sd_diff15                                                        0.06
## lo_ground_truth:conditionHOPs                                              0.11
## lo_ground_truth:conditionintervals                                         0.02
## lo_ground_truth:conditionQDPs                                              0.19
## meansTRUE:conditionHOPs                                                    0.15
## meansTRUE:conditionintervals                                               0.06
## meansTRUE:conditionQDPs                                                    0.03
## sd_diff15:conditionHOPs                                                    0.10
## sd_diff15:conditionintervals                                               0.07
## sd_diff15:conditionQDPs                                                    0.05
## lo_ground_truth:start_meansTRUE                                           -0.02
## meansTRUE:start_meansTRUE                                                  0.04
## sd_diff15:start_meansTRUE                                                  0.06
## conditionHOPs:start_meansTRUE                                              0.15
## conditionintervals:start_meansTRUE                                         0.06
## conditionQDPs:start_meansTRUE                                              0.04
## lo_ground_truth:trial                                                      0.18
## conditionHOPs:trial                                                        0.09
## conditionintervals:trial                                                   0.09
## conditionQDPs:trial                                                        0.10
## lo_ground_truth:meansTRUE:sd_diff15                                        0.09
## lo_ground_truth:meansTRUE:conditionHOPs                                   -0.01
## lo_ground_truth:meansTRUE:conditionintervals                               0.04
## lo_ground_truth:meansTRUE:conditionQDPs                                    0.05
## lo_ground_truth:sd_diff15:conditionHOPs                                    0.11
## lo_ground_truth:sd_diff15:conditionintervals                               0.03
## lo_ground_truth:sd_diff15:conditionQDPs                                    0.07
## meansTRUE:sd_diff15:conditionHOPs                                          0.07
## meansTRUE:sd_diff15:conditionintervals                                     0.04
## meansTRUE:sd_diff15:conditionQDPs                                          0.06
## lo_ground_truth:meansTRUE:start_meansTRUE                                  0.09
## lo_ground_truth:sd_diff15:start_meansTRUE                                  0.06
## meansTRUE:sd_diff15:start_meansTRUE                                        0.04
## lo_ground_truth:conditionHOPs:start_meansTRUE                              0.11
## lo_ground_truth:conditionintervals:start_meansTRUE                         0.21
## lo_ground_truth:conditionQDPs:start_meansTRUE                              0.31
## meansTRUE:conditionHOPs:start_meansTRUE                                    0.01
## meansTRUE:conditionintervals:start_meansTRUE                               0.08
## meansTRUE:conditionQDPs:start_meansTRUE                                    0.09
## sd_diff15:conditionHOPs:start_meansTRUE                                    0.07
## sd_diff15:conditionintervals:start_meansTRUE                               0.06
## sd_diff15:conditionQDPs:start_meansTRUE                                    0.05
## lo_ground_truth:conditionHOPs:trial                                        0.08
## lo_ground_truth:conditionintervals:trial                                   0.09
## lo_ground_truth:conditionQDPs:trial                                        0.09
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs                          0.06
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals                     0.08
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs                          0.03
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE                        0.10
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE                    0.22
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE               0.11
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE                    0.07
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE                    0.09
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE               0.06
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE                    0.05
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                          0.15
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE                     0.10
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                          0.10
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE          0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE     0.03
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE          0.07
## sigma_lo_ground_truth                                                      0.52
## sigma_conditionHOPs                                                        0.83
## sigma_conditionintervals                                                   0.41
## sigma_conditionQDPs                                                        0.20
## sigma_trial                                                               -0.26
## sigma_meansTRUE                                                            0.06
## sigma_start_meansTRUE                                                      0.09
## sigma_lo_ground_truth:conditionHOPs                                       -0.09
## sigma_lo_ground_truth:conditionintervals                                  -0.01
## sigma_lo_ground_truth:conditionQDPs                                        0.06
## sigma_lo_ground_truth:trial                                                0.12
## sigma_conditionHOPs:trial                                                  0.37
## sigma_conditionintervals:trial                                             0.42
## sigma_conditionQDPs:trial                                                  0.24
## sigma_meansTRUE:start_meansTRUE                                           -0.13
## sigma_lo_ground_truth:conditionHOPs:trial                                  0.18
## sigma_lo_ground_truth:conditionintervals:trial                             0.20
## sigma_lo_ground_truth:conditionQDPs:trial                                  0.11
##                                                                        Rhat
## Intercept                                                              1.00
## sigma_Intercept                                                        1.00
## lo_ground_truth                                                        1.00
## meansTRUE                                                              1.00
## sd_diff15                                                              1.00
## conditionHOPs                                                          1.00
## conditionintervals                                                     1.00
## conditionQDPs                                                          1.00
## start_meansTRUE                                                        1.00
## trial                                                                  1.00
## lo_ground_truth:meansTRUE                                              1.00
## lo_ground_truth:sd_diff15                                              1.00
## meansTRUE:sd_diff15                                                    1.00
## lo_ground_truth:conditionHOPs                                          1.00
## lo_ground_truth:conditionintervals                                     1.00
## lo_ground_truth:conditionQDPs                                          1.00
## meansTRUE:conditionHOPs                                                1.00
## meansTRUE:conditionintervals                                           1.00
## meansTRUE:conditionQDPs                                                1.00
## sd_diff15:conditionHOPs                                                1.00
## sd_diff15:conditionintervals                                           1.00
## sd_diff15:conditionQDPs                                                1.00
## lo_ground_truth:start_meansTRUE                                        1.00
## meansTRUE:start_meansTRUE                                              1.00
## sd_diff15:start_meansTRUE                                              1.00
## conditionHOPs:start_meansTRUE                                          1.00
## conditionintervals:start_meansTRUE                                     1.00
## conditionQDPs:start_meansTRUE                                          1.00
## lo_ground_truth:trial                                                  1.00
## conditionHOPs:trial                                                    1.00
## conditionintervals:trial                                               1.00
## conditionQDPs:trial                                                    1.00
## lo_ground_truth:meansTRUE:sd_diff15                                    1.00
## lo_ground_truth:meansTRUE:conditionHOPs                                1.00
## lo_ground_truth:meansTRUE:conditionintervals                           1.00
## lo_ground_truth:meansTRUE:conditionQDPs                                1.00
## lo_ground_truth:sd_diff15:conditionHOPs                                1.00
## lo_ground_truth:sd_diff15:conditionintervals                           1.00
## lo_ground_truth:sd_diff15:conditionQDPs                                1.00
## meansTRUE:sd_diff15:conditionHOPs                                      1.00
## meansTRUE:sd_diff15:conditionintervals                                 1.00
## meansTRUE:sd_diff15:conditionQDPs                                      1.00
## lo_ground_truth:meansTRUE:start_meansTRUE                              1.00
## lo_ground_truth:sd_diff15:start_meansTRUE                              1.00
## meansTRUE:sd_diff15:start_meansTRUE                                    1.00
## lo_ground_truth:conditionHOPs:start_meansTRUE                          1.00
## lo_ground_truth:conditionintervals:start_meansTRUE                     1.00
## lo_ground_truth:conditionQDPs:start_meansTRUE                          1.00
## meansTRUE:conditionHOPs:start_meansTRUE                                1.00
## meansTRUE:conditionintervals:start_meansTRUE                           1.00
## meansTRUE:conditionQDPs:start_meansTRUE                                1.00
## sd_diff15:conditionHOPs:start_meansTRUE                                1.00
## sd_diff15:conditionintervals:start_meansTRUE                           1.00
## sd_diff15:conditionQDPs:start_meansTRUE                                1.00
## lo_ground_truth:conditionHOPs:trial                                    1.00
## lo_ground_truth:conditionintervals:trial                               1.00
## lo_ground_truth:conditionQDPs:trial                                    1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs                      1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals                 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs                      1.00
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE                    1.00
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE                1.00
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE           1.00
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE                1.00
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE                1.00
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE           1.00
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE                1.00
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                      1.00
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE                 1.00
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                      1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE      1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE      1.00
## sigma_lo_ground_truth                                                  1.00
## sigma_conditionHOPs                                                    1.00
## sigma_conditionintervals                                               1.00
## sigma_conditionQDPs                                                    1.00
## sigma_trial                                                            1.00
## sigma_meansTRUE                                                        1.00
## sigma_start_meansTRUE                                                  1.00
## sigma_lo_ground_truth:conditionHOPs                                    1.00
## sigma_lo_ground_truth:conditionintervals                               1.00
## sigma_lo_ground_truth:conditionQDPs                                    1.00
## sigma_lo_ground_truth:trial                                            1.00
## sigma_conditionHOPs:trial                                              1.00
## sigma_conditionintervals:trial                                         1.00
## sigma_conditionQDPs:trial                                              1.00
## sigma_meansTRUE:start_meansTRUE                                        1.00
## sigma_lo_ground_truth:conditionHOPs:trial                              1.00
## sigma_lo_ground_truth:conditionintervals:trial                         1.00
## sigma_lo_ground_truth:conditionQDPs:trial                              1.00
##                                                                        Bulk_ESS
## Intercept                                                                  3924
## sigma_Intercept                                                            2585
## lo_ground_truth                                                            4594
## meansTRUE                                                                  3430
## sd_diff15                                                                  4192
## conditionHOPs                                                              4624
## conditionintervals                                                         4464
## conditionQDPs                                                              3974
## start_meansTRUE                                                            3831
## trial                                                                      4666
## lo_ground_truth:meansTRUE                                                  3466
## lo_ground_truth:sd_diff15                                                  3578
## meansTRUE:sd_diff15                                                        3710
## lo_ground_truth:conditionHOPs                                              5234
## lo_ground_truth:conditionintervals                                         5029
## lo_ground_truth:conditionQDPs                                              4672
## meansTRUE:conditionHOPs                                                    4248
## meansTRUE:conditionintervals                                               3939
## meansTRUE:conditionQDPs                                                    3901
## sd_diff15:conditionHOPs                                                    4952
## sd_diff15:conditionintervals                                               4834
## sd_diff15:conditionQDPs                                                    4946
## lo_ground_truth:start_meansTRUE                                            4691
## meansTRUE:start_meansTRUE                                                  3474
## sd_diff15:start_meansTRUE                                                  4225
## conditionHOPs:start_meansTRUE                                              4886
## conditionintervals:start_meansTRUE                                         4380
## conditionQDPs:start_meansTRUE                                              4003
## lo_ground_truth:trial                                                      5324
## conditionHOPs:trial                                                        5802
## conditionintervals:trial                                                   4910
## conditionQDPs:trial                                                        5188
## lo_ground_truth:meansTRUE:sd_diff15                                        3618
## lo_ground_truth:meansTRUE:conditionHOPs                                    4084
## lo_ground_truth:meansTRUE:conditionintervals                               3972
## lo_ground_truth:meansTRUE:conditionQDPs                                    4126
## lo_ground_truth:sd_diff15:conditionHOPs                                    4734
## lo_ground_truth:sd_diff15:conditionintervals                               4104
## lo_ground_truth:sd_diff15:conditionQDPs                                    3918
## meansTRUE:sd_diff15:conditionHOPs                                          4782
## meansTRUE:sd_diff15:conditionintervals                                     4410
## meansTRUE:sd_diff15:conditionQDPs                                          4249
## lo_ground_truth:meansTRUE:start_meansTRUE                                  3451
## lo_ground_truth:sd_diff15:start_meansTRUE                                  3728
## meansTRUE:sd_diff15:start_meansTRUE                                        3787
## lo_ground_truth:conditionHOPs:start_meansTRUE                              5402
## lo_ground_truth:conditionintervals:start_meansTRUE                         5033
## lo_ground_truth:conditionQDPs:start_meansTRUE                              4788
## meansTRUE:conditionHOPs:start_meansTRUE                                    4442
## meansTRUE:conditionintervals:start_meansTRUE                               3838
## meansTRUE:conditionQDPs:start_meansTRUE                                    3889
## sd_diff15:conditionHOPs:start_meansTRUE                                    5159
## sd_diff15:conditionintervals:start_meansTRUE                               5152
## sd_diff15:conditionQDPs:start_meansTRUE                                    5108
## lo_ground_truth:conditionHOPs:trial                                        6163
## lo_ground_truth:conditionintervals:trial                                   5776
## lo_ground_truth:conditionQDPs:trial                                        5577
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs                          4466
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals                     4015
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs                          4034
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE                        3803
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE                    4153
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE               3911
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE                    4041
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE                    5020
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE               4486
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE                    4676
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                          5278
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE                     4595
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                          4353
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE          4850
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE     4310
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE          4213
## sigma_lo_ground_truth                                                      3727
## sigma_conditionHOPs                                                        2626
## sigma_conditionintervals                                                   2458
## sigma_conditionQDPs                                                        2225
## sigma_trial                                                                6609
## sigma_meansTRUE                                                            9256
## sigma_start_meansTRUE                                                      3124
## sigma_lo_ground_truth:conditionHOPs                                        4008
## sigma_lo_ground_truth:conditionintervals                                   3311
## sigma_lo_ground_truth:conditionQDPs                                        3568
## sigma_lo_ground_truth:trial                                                8234
## sigma_conditionHOPs:trial                                                  6675
## sigma_conditionintervals:trial                                             6981
## sigma_conditionQDPs:trial                                                  7431
## sigma_meansTRUE:start_meansTRUE                                            9057
## sigma_lo_ground_truth:conditionHOPs:trial                                  8181
## sigma_lo_ground_truth:conditionintervals:trial                             8377
## sigma_lo_ground_truth:conditionQDPs:trial                                  8103
##                                                                        Tail_ESS
## Intercept                                                                  6057
## sigma_Intercept                                                            4770
## lo_ground_truth                                                            6984
## meansTRUE                                                                  5961
## sd_diff15                                                                  6348
## conditionHOPs                                                              7735
## conditionintervals                                                         6604
## conditionQDPs                                                              6359
## start_meansTRUE                                                            5965
## trial                                                                      7078
## lo_ground_truth:meansTRUE                                                  6159
## lo_ground_truth:sd_diff15                                                  5348
## meansTRUE:sd_diff15                                                        6548
## lo_ground_truth:conditionHOPs                                              7679
## lo_ground_truth:conditionintervals                                         7084
## lo_ground_truth:conditionQDPs                                              7120
## meansTRUE:conditionHOPs                                                    7366
## meansTRUE:conditionintervals                                               6680
## meansTRUE:conditionQDPs                                                    6540
## sd_diff15:conditionHOPs                                                    7733
## sd_diff15:conditionintervals                                               7238
## sd_diff15:conditionQDPs                                                    6504
## lo_ground_truth:start_meansTRUE                                            6881
## meansTRUE:start_meansTRUE                                                  5693
## sd_diff15:start_meansTRUE                                                  7179
## conditionHOPs:start_meansTRUE                                              7785
## conditionintervals:start_meansTRUE                                         6984
## conditionQDPs:start_meansTRUE                                              6740
## lo_ground_truth:trial                                                      7303
## conditionHOPs:trial                                                        8127
## conditionintervals:trial                                                   7612
## conditionQDPs:trial                                                        7242
## lo_ground_truth:meansTRUE:sd_diff15                                        5876
## lo_ground_truth:meansTRUE:conditionHOPs                                    6653
## lo_ground_truth:meansTRUE:conditionintervals                               6798
## lo_ground_truth:meansTRUE:conditionQDPs                                    6750
## lo_ground_truth:sd_diff15:conditionHOPs                                    7731
## lo_ground_truth:sd_diff15:conditionintervals                               6018
## lo_ground_truth:sd_diff15:conditionQDPs                                    6737
## meansTRUE:sd_diff15:conditionHOPs                                          7564
## meansTRUE:sd_diff15:conditionintervals                                     7271
## meansTRUE:sd_diff15:conditionQDPs                                          6517
## lo_ground_truth:meansTRUE:start_meansTRUE                                  5677
## lo_ground_truth:sd_diff15:start_meansTRUE                                  6491
## meansTRUE:sd_diff15:start_meansTRUE                                        6134
## lo_ground_truth:conditionHOPs:start_meansTRUE                              7492
## lo_ground_truth:conditionintervals:start_meansTRUE                         7325
## lo_ground_truth:conditionQDPs:start_meansTRUE                              7088
## meansTRUE:conditionHOPs:start_meansTRUE                                    7153
## meansTRUE:conditionintervals:start_meansTRUE                               5911
## meansTRUE:conditionQDPs:start_meansTRUE                                    5949
## sd_diff15:conditionHOPs:start_meansTRUE                                    7609
## sd_diff15:conditionintervals:start_meansTRUE                               7213
## sd_diff15:conditionQDPs:start_meansTRUE                                    7802
## lo_ground_truth:conditionHOPs:trial                                        8303
## lo_ground_truth:conditionintervals:trial                                   7710
## lo_ground_truth:conditionQDPs:trial                                        6998
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs                          7493
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals                     6492
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs                          6440
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE                        6349
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE                    6671
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE               6719
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE                    6116
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE                    7744
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE               7303
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE                    6843
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                          7007
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE                     7513
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                          7164
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE          6958
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE     6855
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE          6977
## sigma_lo_ground_truth                                                      6198
## sigma_conditionHOPs                                                        4968
## sigma_conditionintervals                                                   4718
## sigma_conditionQDPs                                                        4207
## sigma_trial                                                                8221
## sigma_meansTRUE                                                            8800
## sigma_start_meansTRUE                                                      5883
## sigma_lo_ground_truth:conditionHOPs                                        7173
## sigma_lo_ground_truth:conditionintervals                                   6236
## sigma_lo_ground_truth:conditionQDPs                                        6196
## sigma_lo_ground_truth:trial                                                8697
## sigma_conditionHOPs:trial                                                  8350
## sigma_conditionintervals:trial                                             8367
## sigma_conditionQDPs:trial                                                  9156
## sigma_meansTRUE:start_meansTRUE                                            8632
## sigma_lo_ground_truth:conditionHOPs:trial                                  8474
## sigma_lo_ground_truth:conditionintervals:trial                             8765
## sigma_lo_ground_truth:conditionQDPs:trial                                  8967
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Effects of Adding Means

The primary results about probability of superiority that we present in the paper concern the three way interaction between the ground truth probability of superiority, the presence or absence of extrinsic means, and the level of variance shown lo_ground_truth*means*sd_diff for each uncertainty visualization format we tested. In order to show this effect, we want to show how the slope of the linear in log odds (LLO) model, changes as a function of extrinsic means, variance show, and visualization format. The charts below highlight this effect.

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%         # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.p_sup, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(means, sd_diff, condition, .draw) %>%   # group by predictors to keep
  summarise(slope = mean(slope)) %>%               # marginalize out other predictors by taking a weighted average
  ggplot(aes(x = slope, y = condition, group = means, fill = means)) +
  stat_slabh(alpha = 0.35) +
  labs(
    title = "Slopes in Linear Log Odds Model",
    x = "Slope",
    y = "Visualization",
    fill = "Means Present"
  ) +
  theme_minimal() +
  facet_grid(sd_diff ~ .)

We’ll break this chart down into contrasts and contrasts of contrasts to do some visual reliability testing.

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%         # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.p_sup, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(means, sd_diff, condition, .draw) %>%   # group by predictors to keep
  summarise(slope = mean(slope)) %>%               # marginalize out other predictors by taking a weighted average
  compare_levels(slope, by = means) %>%            # contrast mean present - absent
  ggplot(aes(x = slope, y = condition)) +
  stat_halfeyeh() +
  labs(
    title = "Effect of Means on LLO Slopes",
    x = "Slope Difference (Means present - absent)",
    y = "Visualization"
  ) +
  theme_minimal() +
  facet_grid(sd_diff ~ .)

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%         # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.p_sup, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>% # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(means, sd_diff, condition, .draw) %>%   # group by predictors to keep
  summarise(slope = mean(slope)) %>%               # marginalize out other predictors by taking a weighted average
  compare_levels(slope, by = means) %>%            # contrast mean present - absent
  compare_levels(slope, by = sd_diff) %>%          # contrast sd_diff high - low (I think)
  ggplot(aes(x = slope, y = condition)) +
  stat_halfeyeh() +
  labs(
    title = "Effect of Variance on the Effect of Extrinsic Means",
    x = "Difference in Slope Differences (Effect of means at high - low uncertainty)",
    y = "Visualization"
  ) +
  theme_minimal()

It looks like extrinsic means lead to greater underestimation of probability of superiority (lower LLO slopes) when variance is low, regardless of visualization condition. This is the effect we expected to see. Surprisingly, the impact of extrinsic means does not seem to depend on the intinsic salience of the mean in the uncertainty visualization conditions. At high levels of variance, extrinsic means improve slopes for intervals and densities but still reduce slopes for HOPs.

Effect of means on slopes for each combination of visualization condition and level of variance (in figure).

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.p_sup, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(means, condition, sd_diff, .draw) %>%               # group by predictors to keep
  summarise(slope = mean(slope)) %>%                # marginalize out other predictors by taking a weighted average
  compare_levels(slope, by = means) %>%             # contrast mean present - absent
  mean_qi()
## # A tibble: 8 x 9
## # Groups:   means, condition [4]
##   means      condition sd_diff    slope  .lower   .upper .width .point .interval
##   <fct>      <fct>     <fct>      <dbl>   <dbl>    <dbl>  <dbl> <chr>  <chr>    
## 1 TRUE - FA… densities 5       -0.0250  -0.0479 -0.00232   0.95 mean   qi       
## 2 TRUE - FA… densities 15       0.0385   0.0136  0.0633    0.95 mean   qi       
## 3 TRUE - FA… intervals 5       -0.0230  -0.0430 -0.00350   0.95 mean   qi       
## 4 TRUE - FA… intervals 15       0.0446   0.0245  0.0641    0.95 mean   qi       
## 5 TRUE - FA… HOPs      5       -0.0439  -0.0746 -0.0135    0.95 mean   qi       
## 6 TRUE - FA… HOPs      15      -0.0376  -0.0713 -0.00404   0.95 mean   qi       
## 7 TRUE - FA… QDPs      5       -0.0337  -0.0551 -0.0127    0.95 mean   qi       
## 8 TRUE - FA… QDPs      15      -0.00554 -0.0290  0.0180    0.95 mean   qi

Effect of adding means on predicted error for each combination of visualization condition and level of variance. This helps us contextualize the impact of adding means.

model_df %>%
  data_grid(lo_ground_truth, means, sd_diff, condition, trial, start_means) %>%
  add_predicted_draws(m.p_sup, re_formula = NA, n = 5000, seed = 1234) %>%
  mutate(est_error = plogis(.prediction) - plogis(lo_ground_truth)) %>% # calculate estimation error
  compare_levels(est_error, by = means) %>%                             # contrast mean present - absent
  group_by(means, condition, sd_diff) %>%                               # group by predictors to keep
  mean_qi(est_error)
## # A tibble: 8 x 9
## # Groups:   means, condition [4]
##   means        condition sd_diff est_error .lower .upper .width .point .interval
##   <fct>        <fct>     <fct>       <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 TRUE - FALSE densities 5        -0.00942 -0.236  0.216   0.95 mean   qi       
## 2 TRUE - FALSE densities 15        0.0102  -0.188  0.222   0.95 mean   qi       
## 3 TRUE - FALSE intervals 5        -0.00476 -0.236  0.222   0.95 mean   qi       
## 4 TRUE - FALSE intervals 15        0.0150  -0.191  0.231   0.95 mean   qi       
## 5 TRUE - FALSE HOPs      5        -0.00507 -0.304  0.289   0.95 mean   qi       
## 6 TRUE - FALSE HOPs      15        0.00360 -0.266  0.275   0.95 mean   qi       
## 7 TRUE - FALSE QDPs      5        -0.0128  -0.201  0.172   0.95 mean   qi       
## 8 TRUE - FALSE QDPs      15       -0.00133 -0.170  0.168   0.95 mean   qi

Effect of means on slopes, marginalizing across visualization condition (in figure).

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.p_sup, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(means, sd_diff, .draw) %>%               # group by predictors to keep
  summarise(slope = mean(slope)) %>%                # marginalize out other predictors by taking a weighted average
  compare_levels(slope, by = means) %>%             # contrast mean present - absent
  mean_qi()
## # A tibble: 2 x 8
## # Groups:   means [1]
##   means        sd_diff    slope   .lower  .upper .width .point .interval
##   <fct>        <fct>      <dbl>    <dbl>   <dbl>  <dbl> <chr>  <chr>    
## 1 TRUE - FALSE 5       -0.0314  -0.0437  -0.0195   0.95 mean   qi       
## 2 TRUE - FALSE 15       0.00999 -0.00300  0.0229   0.95 mean   qi

Effect of adding means on predicted error, marginalizing across visualization conditions. This helps us contextualize the impact of adding means.

model_df %>%
  data_grid(lo_ground_truth, means, sd_diff, condition, trial, start_means) %>%
  add_predicted_draws(m.p_sup, re_formula = NA, n = 5000, seed = 1234) %>%
  mutate(est_error = plogis(.prediction) - plogis(lo_ground_truth)) %>% # calculate estimation error
  compare_levels(est_error, by = means) %>%                             # contrast mean present - absent
  group_by(means, sd_diff) %>%                                          # group by predictors to keep
  mean_qi(est_error)
## # A tibble: 2 x 8
## # Groups:   means [1]
##   means        sd_diff est_error .lower .upper .width .point .interval
##   <fct>        <fct>       <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 TRUE - FALSE 5        -0.00802 -0.250  0.234   0.95 mean   qi       
## 2 TRUE - FALSE 15        0.00689 -0.211  0.231   0.95 mean   qi

Visualization Effects

We preregistered comparisons of LLO slopes in each uncertainty visualization condition, marginalizing across other predictors. However, it occurred to us later that these effects are not that useful for making design recommendatations. They represent uncertainty encodings that cannot be rendered: distributions which both do and do not have means added at the same time. This is a statistical abstraction that represents the effectiveness of uncertainty encodings averaging across other maniputlations. As such we present it here but omit comparisons averaging across the presence/absence of the mean from the paper.

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.p_sup, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(condition, .draw) %>%                    # group by predictors to keep
  summarise(slope = mean(slope)) %>%                # marginalize out means present/absent by taking a weighted average
  ggplot(aes(x = slope, y = condition, fill = condition)) +
  stat_slabh(alpha = 0.35) +
  scale_fill_brewer(type = "qual", palette = 2) +
  labs(subtitle = "Slopes Per Visualization Condition") +
  theme_minimal() +
  theme(legend.position = "none")

Let’s look at contrasts between visualization conditions to get a sense of which differences are reliable.

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.p_sup, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(condition, .draw) %>%                    # group by predictors to keep
  summarise(slope = mean(slope)) %>%                # marginalize out means present/absent by taking a weighted average
  compare_levels(slope, by = condition) %>%
  # compare_levels(slope, by = condition, comparison = list(c("QDPs", "intervals"), c("QDPs", "HOPs"), c("QDPs", "densities"), c("densities", "intervals"))) %>%                                  # show only reliable contrasts
  ggplot(aes(x = slope, y = condition)) +
  stat_halfeyeh() +
  labs(x = "Slope Differences Between Visualization Conditions") +
  theme_minimal()

The chart above shows only the contrasts between quantile dotplots and each other conditions are reliable.

Slope estimates per visualization condition.

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.p_sup, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(condition, .draw) %>%                    # group by predictors to keep
  summarise(slope = mean(slope)) %>%
  mean_qi()
## # A tibble: 4 x 7
##   condition slope .lower .upper .width .point .interval
##   <fct>     <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 densities 0.436  0.372  0.499   0.95 mean   qi       
## 2 intervals 0.350  0.289  0.413   0.95 mean   qi       
## 3 HOPs      0.394  0.329  0.460   0.95 mean   qi       
## 4 QDPs      0.566  0.503  0.631   0.95 mean   qi

Predicted error per visualization condition.

model_df %>%
  data_grid(lo_ground_truth, means, sd_diff, condition, trial, start_means) %>%
  add_predicted_draws(m.p_sup, re_formula = NA, n = 5000, seed = 1234) %>%
  mutate(est_error = plogis(.prediction) - plogis(lo_ground_truth)) %>% # calculate estimation error
  group_by(condition) %>%                                               # group by predictors to keep
  mean_qi(est_error)
## # A tibble: 4 x 7
##   condition est_error .lower .upper .width .point .interval
##   <fct>         <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 densities   -0.124  -0.362 0.0287   0.95 mean   qi       
## 2 intervals   -0.146  -0.394 0.0295   0.95 mean   qi       
## 3 HOPs        -0.140  -0.426 0.0689   0.95 mean   qi       
## 4 QDPs        -0.0891 -0.270 0.0353   0.95 mean   qi

Effect of Visualization Design when Averaging Over Variance

Instead of the marginal effects of visualization conditions shown above, what we present in the paper are the effects of each visualization design (uncertainty encoding x means). This means that we are only marginalizing across levels of variance.

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.p_sup, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(condition, means, .draw) %>%             # group by predictors to keep
  summarise(slope = mean(slope)) %>%                # marginalize by taking a weighted average
  ggplot(aes(x = slope, y = condition, group = means, fill = means)) +
  stat_slabh(alpha = 0.35) +
  labs(subtitle = "Slopes Per Visualization Design") +
  theme_minimal() +
  theme(legend.position = "none")

Let’s look at contrasts between visualization designs to get a sense of which differences are reliable.

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.p_sup, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  unite("design", c(condition, means)) %>%
  group_by(design, .draw) %>%                       # group by predictors to keep
  summarise(slope = mean(slope)) %>%                # marginalize by taking a weighted average
  compare_levels(slope, by = design) %>%
  ggplot(aes(x = slope, y = design)) +
  stat_halfeyeh() +
  labs(x = "Slope Differences Between Visualization Designs") +
  theme_minimal()

Quantile dotplots outperform any other condition with or without means added. Densities with and without means are reliably better than intervals without means. HOPs are not reliably different from intervals or densities with or without means added. The effect of adding means is only reliable for HOPs, but we can see below that the predicted error only changes by a negligible 0.08 percentage points in terms of probability of superiority.

Effect of means on slopes, marginalizing across levels of variance (in figure).

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.p_sup, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(means, condition, .draw) %>%             # group by predictors to keep
  summarise(slope = mean(slope)) %>%                # marginalize by taking a weighted average
  compare_levels(slope, by = means) %>%             # contrast mean present - absent
  mean_qi()
## # A tibble: 4 x 8
## # Groups:   means [1]
##   means        condition    slope   .lower   .upper .width .point .interval
##   <fct>        <fct>        <dbl>    <dbl>    <dbl>  <dbl> <chr>  <chr>    
## 1 TRUE - FALSE densities  0.00677 -0.0125   0.0256    0.95 mean   qi       
## 2 TRUE - FALSE intervals  0.0108  -0.00543  0.0267    0.95 mean   qi       
## 3 TRUE - FALSE HOPs      -0.0408  -0.0669  -0.0146    0.95 mean   qi       
## 4 TRUE - FALSE QDPs      -0.0196  -0.0381  -0.00105   0.95 mean   qi

Predicted error with and without means, marginalizing across levels of variance. This helps us give a sense of visualization effectiveness.

model_df %>%
  data_grid(lo_ground_truth, means, sd_diff, condition, trial, start_means) %>%
  add_predicted_draws(m.p_sup, re_formula = NA, n = 5000, seed = 1234) %>%
  mutate(est_error = plogis(.prediction) - plogis(lo_ground_truth)) %>% # calculate estimation error
  group_by(means, condition) %>%                                        # group by predictors to keep
  mean_qi(est_error)
## # A tibble: 8 x 8
## # Groups:   means [2]
##   means condition est_error .lower .upper .width .point .interval
##   <fct> <fct>         <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 FALSE densities   -0.124  -0.369 0.0317   0.95 mean   qi       
## 2 FALSE intervals   -0.148  -0.403 0.0313   0.95 mean   qi       
## 3 FALSE HOPs        -0.140  -0.432 0.0713   0.95 mean   qi       
## 4 FALSE QDPs        -0.0856 -0.267 0.0402   0.95 mean   qi       
## 5 TRUE  densities   -0.124  -0.354 0.0256   0.95 mean   qi       
## 6 TRUE  intervals   -0.143  -0.385 0.0277   0.95 mean   qi       
## 7 TRUE  HOPs        -0.140  -0.419 0.0665   0.95 mean   qi       
## 8 TRUE  QDPs        -0.0926 -0.273 0.0295   0.95 mean   qi

Effect of High vs Low Variance

Let’s look at the marginal effect of high vs low varaince on LLO slopes. This is an exploratory comparison that we do not present in the paper.

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.p_sup, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(sd_diff, .draw) %>%                      # group by predictors to keep
  summarise(slope = mean(slope)) %>%                # marginalize by taking a weighted average
  compare_levels(slope, by = sd_diff) %>%
  ggplot(aes(x = slope, y = "Effect of Variance")) +
  stat_halfeyeh() +
  labs(subtitle = "Difference in LLO Slopes (High - Low Variance)") +
  theme_minimal() +
  theme(legend.position = "none")

It looks like LLO slopes are larger at high than at low variance. One potential reason for this is that high variance stimuli use white space more effiencently, making the task easier especially for users relying on distance as a proxy for effect size.

Visualizing Posterior Predictions

Let’s look at predicted magnitude estimates to try to help with the interpretation of LLO slope as a metric.

model_df %>%
  data_grid(lo_ground_truth, means, sd_diff, condition, start_means, trial) %>%
  add_predicted_draws(m.p_sup, re_formula = NA, n = 500) %>%
  ggplot(aes(x = plogis(lo_ground_truth), y = plogis(.prediction), color = means, fill = means)) +
  stat_lineribbon(.width = c(.95), alpha = .25, show.legend = FALSE) +
  theme_minimal() +
  facet_grid(condition ~ sd_diff)

I find it hard to see the slope differences on this chart. The noise in posterior predictions swamps the signal we are able to measure using LLO slopes as a metric. This is we are referring to when we say that LLO slopes give us greater statistical power than simpler metrics like accuracy.

We can do a little better at showing the effect of interest by removing uncertainty in the prediction, but this seems a little antithetical to the whole point of the paper.

model_df %>%
  data_grid(lo_ground_truth, means, sd_diff, condition, start_means, trial) %>%
  add_predicted_draws(m.p_sup, re_formula = NA, n = 500) %>%
  group_by(lo_ground_truth, means, sd_diff, condition) %>% # marginalize
  mutate(
    ground_truth = plogis(lo_ground_truth),
    avg_prediction = mean(plogis(.prediction))
  ) %>%
  ggplot(aes(x = ground_truth, y = avg_prediction, color = means, fill = means)) +
  stat_lineribbon(.width = c(.95), alpha = .35, show.legend = FALSE) +
  theme_minimal() +
  # coord_cartesian(ylim = c(0, 1)) +
  facet_grid(condition ~ sd_diff)

We can also look at predicted errors in estimated probability of superiority to give a different view, although this isn’t much better.

model_df %>%
  data_grid(lo_ground_truth, means, sd_diff, condition, start_means, trial) %>%
  add_predicted_draws(m.p_sup, re_formula = NA, n = 500) %>%
  mutate(est_error = plogis(.prediction) - plogis(lo_ground_truth)) %>% # calculate estimation error
  ggplot(aes(x = plogis(lo_ground_truth), y = est_error, color = means, fill = means)) +
  stat_lineribbon(.width = c(.95), alpha = .25, show.legend = FALSE) +
  theme_minimal() +
  facet_grid(condition ~ sd_diff)

In the paper, we decided to describe posterior predictions in term of marginal predicted average error for selected comparisons. We do this to contextualize LLO slopes in terms of average error, a more familiar but less precise metric for the kind of bias we measure.

Intervention Decisions

Next, we load in the model of intervention decisions that we arrived at through a process of model expansion described in our preregistration[https://osf.io/9kpmb]. This is a hierachical logistic regression modeling the probability that chart users choose to pay for an intervention based on its effect size compared to status quo if they do not pay. See the paper and experiment/analysis/InterventionDecisions.Rmd in the supplemental materials for details.

m.decisions <- brm(
  data = model_df, family = bernoulli(link = "logit"),
  formula = bf(intervene ~ (1 + evidence*means*sd_diff + evidence*trial|worker_id) + evidence*means*sd_diff*condition*start_means + evidence*condition*trial),
  prior = c(prior(normal(0, 1), class = Intercept),
            prior(normal(1, 1), class = b, coef = evidence),
            prior(normal(0, 0.5), class = b),
            prior(normal(0, 0.5), class = sd),
            prior(lkj(4), class = cor)),
  iter = 8000, warmup = 2000, chains = 2, cores = 2, thin = 2,
  file = "model-fits/logistic_mdl-min_order-r_means_sd_trial2-long_chains")
summary(m.decisions)
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: intervene ~ (1 + evidence * means * sd_diff + evidence * trial | worker_id) + evidence * means * sd_diff * condition * start_means + evidence * condition * trial 
##    Data: model_df (Number of observations: 19892) 
## Samples: 2 chains, each with iter = 8000; warmup = 2000; thin = 2;
##          total post-warmup samples = 6000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 622) 
##                                                       Estimate Est.Error
## sd(Intercept)                                             1.86      0.09
## sd(evidence)                                              1.25      0.08
## sd(meansTRUE)                                             1.32      0.11
## sd(sd_diff15)                                             1.15      0.09
## sd(trial)                                                 2.51      0.16
## sd(evidence:meansTRUE)                                    0.77      0.11
## sd(evidence:sd_diff15)                                    0.73      0.10
## sd(meansTRUE:sd_diff15)                                   0.73      0.16
## sd(evidence:trial)                                        1.51      0.16
## sd(evidence:meansTRUE:sd_diff15)                          0.67      0.19
## cor(Intercept,evidence)                                   0.54      0.05
## cor(Intercept,meansTRUE)                                 -0.10      0.09
## cor(evidence,meansTRUE)                                   0.06      0.09
## cor(Intercept,sd_diff15)                                 -0.38      0.08
## cor(evidence,sd_diff15)                                  -0.04      0.09
## cor(meansTRUE,sd_diff15)                                  0.24      0.10
## cor(Intercept,trial)                                      0.37      0.06
## cor(evidence,trial)                                       0.14      0.08
## cor(meansTRUE,trial)                                      0.22      0.08
## cor(sd_diff15,trial)                                     -0.05      0.09
## cor(Intercept,evidence:meansTRUE)                        -0.17      0.11
## cor(evidence,evidence:meansTRUE)                         -0.04      0.13
## cor(meansTRUE,evidence:meansTRUE)                         0.45      0.12
## cor(sd_diff15,evidence:meansTRUE)                         0.29      0.12
## cor(trial,evidence:meansTRUE)                             0.16      0.12
## cor(Intercept,evidence:sd_diff15)                        -0.38      0.11
## cor(evidence,evidence:sd_diff15)                         -0.08      0.12
## cor(meansTRUE,evidence:sd_diff15)                        -0.08      0.13
## cor(sd_diff15,evidence:sd_diff15)                         0.64      0.10
## cor(trial,evidence:sd_diff15)                            -0.10      0.12
## cor(evidence:meansTRUE,evidence:sd_diff15)                0.16      0.15
## cor(Intercept,meansTRUE:sd_diff15)                       -0.10      0.15
## cor(evidence,meansTRUE:sd_diff15)                         0.30      0.14
## cor(meansTRUE,meansTRUE:sd_diff15)                       -0.16      0.17
## cor(sd_diff15,meansTRUE:sd_diff15)                        0.06      0.17
## cor(trial,meansTRUE:sd_diff15)                           -0.08      0.16
## cor(evidence:meansTRUE,meansTRUE:sd_diff15)               0.15      0.18
## cor(evidence:sd_diff15,meansTRUE:sd_diff15)               0.23      0.18
## cor(Intercept,evidence:trial)                             0.26      0.09
## cor(evidence,evidence:trial)                              0.40      0.10
## cor(meansTRUE,evidence:trial)                             0.06      0.12
## cor(sd_diff15,evidence:trial)                            -0.18      0.11
## cor(trial,evidence:trial)                                 0.50      0.09
## cor(evidence:meansTRUE,evidence:trial)                    0.26      0.13
## cor(evidence:sd_diff15,evidence:trial)                   -0.12      0.13
## cor(meansTRUE:sd_diff15,evidence:trial)                   0.12      0.17
## cor(Intercept,evidence:meansTRUE:sd_diff15)              -0.21      0.15
## cor(evidence,evidence:meansTRUE:sd_diff15)                0.10      0.16
## cor(meansTRUE,evidence:meansTRUE:sd_diff15)              -0.01      0.18
## cor(sd_diff15,evidence:meansTRUE:sd_diff15)               0.06      0.18
## cor(trial,evidence:meansTRUE:sd_diff15)                  -0.08      0.16
## cor(evidence:meansTRUE,evidence:meansTRUE:sd_diff15)     -0.07      0.19
## cor(evidence:sd_diff15,evidence:meansTRUE:sd_diff15)      0.09      0.19
## cor(meansTRUE:sd_diff15,evidence:meansTRUE:sd_diff15)     0.47      0.18
## cor(evidence:trial,evidence:meansTRUE:sd_diff15)         -0.11      0.17
##                                                       l-95% CI u-95% CI Rhat
## sd(Intercept)                                             1.69     2.04 1.00
## sd(evidence)                                              1.10     1.41 1.00
## sd(meansTRUE)                                             1.09     1.55 1.00
## sd(sd_diff15)                                             0.97     1.34 1.00
## sd(trial)                                                 2.19     2.83 1.00
## sd(evidence:meansTRUE)                                    0.56     0.98 1.00
## sd(evidence:sd_diff15)                                    0.53     0.92 1.00
## sd(meansTRUE:sd_diff15)                                   0.40     1.04 1.00
## sd(evidence:trial)                                        1.20     1.83 1.00
## sd(evidence:meansTRUE:sd_diff15)                          0.26     1.02 1.00
## cor(Intercept,evidence)                                   0.43     0.64 1.00
## cor(Intercept,meansTRUE)                                 -0.27     0.08 1.00
## cor(evidence,meansTRUE)                                  -0.11     0.23 1.00
## cor(Intercept,sd_diff15)                                 -0.52    -0.22 1.00
## cor(evidence,sd_diff15)                                  -0.21     0.13 1.00
## cor(meansTRUE,sd_diff15)                                  0.04     0.42 1.00
## cor(Intercept,trial)                                      0.24     0.49 1.00
## cor(evidence,trial)                                      -0.02     0.28 1.00
## cor(meansTRUE,trial)                                      0.06     0.39 1.00
## cor(sd_diff15,trial)                                     -0.23     0.13 1.00
## cor(Intercept,evidence:meansTRUE)                        -0.39     0.04 1.00
## cor(evidence,evidence:meansTRUE)                         -0.29     0.22 1.00
## cor(meansTRUE,evidence:meansTRUE)                         0.22     0.68 1.00
## cor(sd_diff15,evidence:meansTRUE)                         0.04     0.52 1.00
## cor(trial,evidence:meansTRUE)                            -0.08     0.39 1.00
## cor(Intercept,evidence:sd_diff15)                        -0.58    -0.16 1.00
## cor(evidence,evidence:sd_diff15)                         -0.31     0.17 1.00
## cor(meansTRUE,evidence:sd_diff15)                        -0.33     0.17 1.00
## cor(sd_diff15,evidence:sd_diff15)                         0.44     0.81 1.00
## cor(trial,evidence:sd_diff15)                            -0.34     0.13 1.00
## cor(evidence:meansTRUE,evidence:sd_diff15)               -0.14     0.44 1.00
## cor(Intercept,meansTRUE:sd_diff15)                       -0.40     0.19 1.00
## cor(evidence,meansTRUE:sd_diff15)                         0.02     0.57 1.00
## cor(meansTRUE,meansTRUE:sd_diff15)                       -0.46     0.19 1.00
## cor(sd_diff15,meansTRUE:sd_diff15)                       -0.26     0.42 1.00
## cor(trial,meansTRUE:sd_diff15)                           -0.39     0.24 1.00
## cor(evidence:meansTRUE,meansTRUE:sd_diff15)              -0.20     0.50 1.00
## cor(evidence:sd_diff15,meansTRUE:sd_diff15)              -0.13     0.58 1.00
## cor(Intercept,evidence:trial)                             0.08     0.43 1.00
## cor(evidence,evidence:trial)                              0.20     0.59 1.00
## cor(meansTRUE,evidence:trial)                            -0.17     0.29 1.00
## cor(sd_diff15,evidence:trial)                            -0.38     0.04 1.00
## cor(trial,evidence:trial)                                 0.33     0.67 1.00
## cor(evidence:meansTRUE,evidence:trial)                   -0.00     0.50 1.00
## cor(evidence:sd_diff15,evidence:trial)                   -0.38     0.15 1.00
## cor(meansTRUE:sd_diff15,evidence:trial)                  -0.22     0.45 1.00
## cor(Intercept,evidence:meansTRUE:sd_diff15)              -0.50     0.10 1.00
## cor(evidence,evidence:meansTRUE:sd_diff15)               -0.21     0.39 1.00
## cor(meansTRUE,evidence:meansTRUE:sd_diff15)              -0.36     0.34 1.00
## cor(sd_diff15,evidence:meansTRUE:sd_diff15)              -0.28     0.42 1.00
## cor(trial,evidence:meansTRUE:sd_diff15)                  -0.40     0.24 1.00
## cor(evidence:meansTRUE,evidence:meansTRUE:sd_diff15)     -0.42     0.33 1.00
## cor(evidence:sd_diff15,evidence:meansTRUE:sd_diff15)     -0.26     0.49 1.00
## cor(meansTRUE:sd_diff15,evidence:meansTRUE:sd_diff15)     0.07     0.76 1.00
## cor(evidence:trial,evidence:meansTRUE:sd_diff15)         -0.43     0.24 1.00
##                                                       Bulk_ESS Tail_ESS
## sd(Intercept)                                             3190     4292
## sd(evidence)                                              3058     4461
## sd(meansTRUE)                                             1612     2990
## sd(sd_diff15)                                             2599     4180
## sd(trial)                                                 2471     4138
## sd(evidence:meansTRUE)                                    1538     2446
## sd(evidence:sd_diff15)                                    1921     3090
## sd(meansTRUE:sd_diff15)                                   1032     2038
## sd(evidence:trial)                                        2350     3688
## sd(evidence:meansTRUE:sd_diff15)                           848      847
## cor(Intercept,evidence)                                   2272     3940
## cor(Intercept,meansTRUE)                                  2406     3750
## cor(evidence,meansTRUE)                                   1949     3268
## cor(Intercept,sd_diff15)                                  2968     4690
## cor(evidence,sd_diff15)                                   2082     3333
## cor(meansTRUE,sd_diff15)                                  1668     2843
## cor(Intercept,trial)                                      2898     4233
## cor(evidence,trial)                                       2063     3515
## cor(meansTRUE,trial)                                      1933     3834
## cor(sd_diff15,trial)                                      1858     3241
## cor(Intercept,evidence:meansTRUE)                         2846     4493
## cor(evidence,evidence:meansTRUE)                          2600     3989
## cor(meansTRUE,evidence:meansTRUE)                         1312     3390
## cor(sd_diff15,evidence:meansTRUE)                         1636     3054
## cor(trial,evidence:meansTRUE)                             2019     3719
## cor(Intercept,evidence:sd_diff15)                         2821     4307
## cor(evidence,evidence:sd_diff15)                          3105     4384
## cor(meansTRUE,evidence:sd_diff15)                         2264     4012
## cor(sd_diff15,evidence:sd_diff15)                         2212     3869
## cor(trial,evidence:sd_diff15)                             2831     4078
## cor(evidence:meansTRUE,evidence:sd_diff15)                2419     3804
## cor(Intercept,meansTRUE:sd_diff15)                        3867     4551
## cor(evidence,meansTRUE:sd_diff15)                         3152     4670
## cor(meansTRUE,meansTRUE:sd_diff15)                        2511     4118
## cor(sd_diff15,meansTRUE:sd_diff15)                        1734     3474
## cor(trial,meansTRUE:sd_diff15)                            2015     3868
## cor(evidence:meansTRUE,meansTRUE:sd_diff15)               1819     3396
## cor(evidence:sd_diff15,meansTRUE:sd_diff15)               1727     3493
## cor(Intercept,evidence:trial)                             3068     4615
## cor(evidence,evidence:trial)                              2680     4130
## cor(meansTRUE,evidence:trial)                             1791     3355
## cor(sd_diff15,evidence:trial)                             2246     3842
## cor(trial,evidence:trial)                                 2195     3860
## cor(evidence:meansTRUE,evidence:trial)                    2427     3776
## cor(evidence:sd_diff15,evidence:trial)                    2007     3719
## cor(meansTRUE:sd_diff15,evidence:trial)                   1399     2790
## cor(Intercept,evidence:meansTRUE:sd_diff15)               3459     3849
## cor(evidence,evidence:meansTRUE:sd_diff15)                3905     4262
## cor(meansTRUE,evidence:meansTRUE:sd_diff15)               3161     3810
## cor(sd_diff15,evidence:meansTRUE:sd_diff15)               2776     4130
## cor(trial,evidence:meansTRUE:sd_diff15)                   3646     4390
## cor(evidence:meansTRUE,evidence:meansTRUE:sd_diff15)      2565     4190
## cor(evidence:sd_diff15,evidence:meansTRUE:sd_diff15)      1800     3564
## cor(meansTRUE:sd_diff15,evidence:meansTRUE:sd_diff15)     1630     1885
## cor(evidence:trial,evidence:meansTRUE:sd_diff15)          2625     4330
## 
## Population-Level Effects: 
##                                                                 Estimate
## Intercept                                                           0.33
## evidence                                                            2.15
## meansTRUE                                                          -0.41
## sd_diff15                                                           1.07
## conditionHOPs                                                      -0.26
## conditionintervals                                                 -0.33
## conditionQDPs                                                       0.31
## start_meansTRUE                                                    -0.50
## trial                                                               1.26
## evidence:meansTRUE                                                 -0.12
## evidence:sd_diff15                                                  0.61
## meansTRUE:sd_diff15                                                 0.64
## evidence:conditionHOPs                                             -0.19
## evidence:conditionintervals                                        -0.19
## evidence:conditionQDPs                                              0.31
## meansTRUE:conditionHOPs                                             0.01
## meansTRUE:conditionintervals                                       -0.01
## meansTRUE:conditionQDPs                                            -0.33
## sd_diff15:conditionHOPs                                             0.46
## sd_diff15:conditionintervals                                        0.36
## sd_diff15:conditionQDPs                                             0.09
## evidence:start_meansTRUE                                           -0.50
## meansTRUE:start_meansTRUE                                           0.49
## sd_diff15:start_meansTRUE                                           0.50
## conditionHOPs:start_meansTRUE                                      -0.37
## conditionintervals:start_meansTRUE                                 -0.29
## conditionQDPs:start_meansTRUE                                       0.22
## evidence:trial                                                      1.71
## conditionHOPs:trial                                                -0.02
## conditionintervals:trial                                            0.65
## conditionQDPs:trial                                                 0.33
## evidence:meansTRUE:sd_diff15                                        0.04
## evidence:meansTRUE:conditionHOPs                                   -0.27
## evidence:meansTRUE:conditionintervals                               0.16
## evidence:meansTRUE:conditionQDPs                                    0.05
## evidence:sd_diff15:conditionHOPs                                    0.09
## evidence:sd_diff15:conditionintervals                               0.33
## evidence:sd_diff15:conditionQDPs                                    0.20
## meansTRUE:sd_diff15:conditionHOPs                                  -0.53
## meansTRUE:sd_diff15:conditionintervals                              0.55
## meansTRUE:sd_diff15:conditionQDPs                                   0.25
## evidence:meansTRUE:start_meansTRUE                                  0.38
## evidence:sd_diff15:start_meansTRUE                                  0.21
## meansTRUE:sd_diff15:start_meansTRUE                                -0.05
## evidence:conditionHOPs:start_meansTRUE                             -0.35
## evidence:conditionintervals:start_meansTRUE                         0.13
## evidence:conditionQDPs:start_meansTRUE                             -0.10
## meansTRUE:conditionHOPs:start_meansTRUE                             0.16
## meansTRUE:conditionintervals:start_meansTRUE                        0.14
## meansTRUE:conditionQDPs:start_meansTRUE                             0.01
## sd_diff15:conditionHOPs:start_meansTRUE                             0.13
## sd_diff15:conditionintervals:start_meansTRUE                       -0.33
## sd_diff15:conditionQDPs:start_meansTRUE                            -0.25
## evidence:conditionHOPs:trial                                       -0.44
## evidence:conditionintervals:trial                                   0.68
## evidence:conditionQDPs:trial                                        0.39
## evidence:meansTRUE:sd_diff15:conditionHOPs                         -0.46
## evidence:meansTRUE:sd_diff15:conditionintervals                     0.25
## evidence:meansTRUE:sd_diff15:conditionQDPs                         -0.07
## evidence:meansTRUE:sd_diff15:start_meansTRUE                        0.31
## evidence:meansTRUE:conditionHOPs:start_meansTRUE                    0.54
## evidence:meansTRUE:conditionintervals:start_meansTRUE              -0.01
## evidence:meansTRUE:conditionQDPs:start_meansTRUE                    0.15
## evidence:sd_diff15:conditionHOPs:start_meansTRUE                    0.11
## evidence:sd_diff15:conditionintervals:start_meansTRUE              -0.39
## evidence:sd_diff15:conditionQDPs:start_meansTRUE                   -0.01
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                   0.34
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE             -0.26
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                  -0.17
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE          0.27
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE     0.29
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE         -0.05
##                                                                 Est.Error
## Intercept                                                            0.17
## evidence                                                             0.15
## meansTRUE                                                            0.18
## sd_diff15                                                            0.16
## conditionHOPs                                                        0.23
## conditionintervals                                                   0.23
## conditionQDPs                                                        0.23
## start_meansTRUE                                                      0.22
## trial                                                                0.22
## evidence:meansTRUE                                                   0.17
## evidence:sd_diff15                                                   0.16
## meansTRUE:sd_diff15                                                  0.19
## evidence:conditionHOPs                                               0.19
## evidence:conditionintervals                                          0.19
## evidence:conditionQDPs                                               0.20
## meansTRUE:conditionHOPs                                              0.25
## meansTRUE:conditionintervals                                         0.25
## meansTRUE:conditionQDPs                                              0.25
## sd_diff15:conditionHOPs                                              0.22
## sd_diff15:conditionintervals                                         0.21
## sd_diff15:conditionQDPs                                              0.22
## evidence:start_meansTRUE                                             0.19
## meansTRUE:start_meansTRUE                                            0.23
## sd_diff15:start_meansTRUE                                            0.20
## conditionHOPs:start_meansTRUE                                        0.31
## conditionintervals:start_meansTRUE                                   0.30
## conditionQDPs:start_meansTRUE                                        0.30
## evidence:trial                                                       0.22
## conditionHOPs:trial                                                  0.31
## conditionintervals:trial                                             0.31
## conditionQDPs:trial                                                  0.32
## evidence:meansTRUE:sd_diff15                                         0.21
## evidence:meansTRUE:conditionHOPs                                     0.22
## evidence:meansTRUE:conditionintervals                                0.23
## evidence:meansTRUE:conditionQDPs                                     0.24
## evidence:sd_diff15:conditionHOPs                                     0.20
## evidence:sd_diff15:conditionintervals                                0.21
## evidence:sd_diff15:conditionQDPs                                     0.22
## meansTRUE:sd_diff15:conditionHOPs                                    0.25
## meansTRUE:sd_diff15:conditionintervals                               0.26
## meansTRUE:sd_diff15:conditionQDPs                                    0.26
## evidence:meansTRUE:start_meansTRUE                                   0.22
## evidence:sd_diff15:start_meansTRUE                                   0.20
## meansTRUE:sd_diff15:start_meansTRUE                                  0.23
## evidence:conditionHOPs:start_meansTRUE                               0.26
## evidence:conditionintervals:start_meansTRUE                          0.27
## evidence:conditionQDPs:start_meansTRUE                               0.28
## meansTRUE:conditionHOPs:start_meansTRUE                              0.32
## meansTRUE:conditionintervals:start_meansTRUE                         0.32
## meansTRUE:conditionQDPs:start_meansTRUE                              0.32
## sd_diff15:conditionHOPs:start_meansTRUE                              0.28
## sd_diff15:conditionintervals:start_meansTRUE                         0.28
## sd_diff15:conditionQDPs:start_meansTRUE                              0.28
## evidence:conditionHOPs:trial                                         0.29
## evidence:conditionintervals:trial                                    0.29
## evidence:conditionQDPs:trial                                         0.30
## evidence:meansTRUE:sd_diff15:conditionHOPs                           0.25
## evidence:meansTRUE:sd_diff15:conditionintervals                      0.27
## evidence:meansTRUE:sd_diff15:conditionQDPs                           0.27
## evidence:meansTRUE:sd_diff15:start_meansTRUE                         0.24
## evidence:meansTRUE:conditionHOPs:start_meansTRUE                     0.29
## evidence:meansTRUE:conditionintervals:start_meansTRUE                0.31
## evidence:meansTRUE:conditionQDPs:start_meansTRUE                     0.31
## evidence:sd_diff15:conditionHOPs:start_meansTRUE                     0.27
## evidence:sd_diff15:conditionintervals:start_meansTRUE                0.27
## evidence:sd_diff15:conditionQDPs:start_meansTRUE                     0.28
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                    0.32
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE               0.32
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                    0.32
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE           0.31
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE      0.33
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE           0.33
##                                                                 l-95% CI
## Intercept                                                          -0.00
## evidence                                                            1.87
## meansTRUE                                                          -0.77
## sd_diff15                                                           0.77
## conditionHOPs                                                      -0.71
## conditionintervals                                                 -0.78
## conditionQDPs                                                      -0.14
## start_meansTRUE                                                    -0.93
## trial                                                               0.84
## evidence:meansTRUE                                                 -0.45
## evidence:sd_diff15                                                  0.30
## meansTRUE:sd_diff15                                                 0.28
## evidence:conditionHOPs                                             -0.57
## evidence:conditionintervals                                        -0.57
## evidence:conditionQDPs                                             -0.08
## meansTRUE:conditionHOPs                                            -0.49
## meansTRUE:conditionintervals                                       -0.49
## meansTRUE:conditionQDPs                                            -0.82
## sd_diff15:conditionHOPs                                             0.04
## sd_diff15:conditionintervals                                       -0.05
## sd_diff15:conditionQDPs                                            -0.35
## evidence:start_meansTRUE                                           -0.88
## meansTRUE:start_meansTRUE                                           0.02
## sd_diff15:start_meansTRUE                                           0.11
## conditionHOPs:start_meansTRUE                                      -0.98
## conditionintervals:start_meansTRUE                                 -0.88
## conditionQDPs:start_meansTRUE                                      -0.36
## evidence:trial                                                      1.28
## conditionHOPs:trial                                                -0.63
## conditionintervals:trial                                            0.06
## conditionQDPs:trial                                                -0.29
## evidence:meansTRUE:sd_diff15                                       -0.37
## evidence:meansTRUE:conditionHOPs                                   -0.70
## evidence:meansTRUE:conditionintervals                              -0.29
## evidence:meansTRUE:conditionQDPs                                   -0.42
## evidence:sd_diff15:conditionHOPs                                   -0.31
## evidence:sd_diff15:conditionintervals                              -0.08
## evidence:sd_diff15:conditionQDPs                                   -0.22
## meansTRUE:sd_diff15:conditionHOPs                                  -1.00
## meansTRUE:sd_diff15:conditionintervals                              0.04
## meansTRUE:sd_diff15:conditionQDPs                                  -0.27
## evidence:meansTRUE:start_meansTRUE                                 -0.05
## evidence:sd_diff15:start_meansTRUE                                 -0.17
## meansTRUE:sd_diff15:start_meansTRUE                                -0.50
## evidence:conditionHOPs:start_meansTRUE                             -0.87
## evidence:conditionintervals:start_meansTRUE                        -0.39
## evidence:conditionQDPs:start_meansTRUE                             -0.65
## meansTRUE:conditionHOPs:start_meansTRUE                            -0.45
## meansTRUE:conditionintervals:start_meansTRUE                       -0.48
## meansTRUE:conditionQDPs:start_meansTRUE                            -0.63
## sd_diff15:conditionHOPs:start_meansTRUE                            -0.41
## sd_diff15:conditionintervals:start_meansTRUE                       -0.88
## sd_diff15:conditionQDPs:start_meansTRUE                            -0.81
## evidence:conditionHOPs:trial                                       -1.01
## evidence:conditionintervals:trial                                   0.12
## evidence:conditionQDPs:trial                                       -0.19
## evidence:meansTRUE:sd_diff15:conditionHOPs                         -0.95
## evidence:meansTRUE:sd_diff15:conditionintervals                    -0.27
## evidence:meansTRUE:sd_diff15:conditionQDPs                         -0.60
## evidence:meansTRUE:sd_diff15:start_meansTRUE                       -0.16
## evidence:meansTRUE:conditionHOPs:start_meansTRUE                   -0.04
## evidence:meansTRUE:conditionintervals:start_meansTRUE              -0.62
## evidence:meansTRUE:conditionQDPs:start_meansTRUE                   -0.47
## evidence:sd_diff15:conditionHOPs:start_meansTRUE                   -0.40
## evidence:sd_diff15:conditionintervals:start_meansTRUE              -0.93
## evidence:sd_diff15:conditionQDPs:start_meansTRUE                   -0.56
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                  -0.27
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE             -0.88
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                  -0.78
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE         -0.35
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE    -0.35
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE         -0.69
##                                                                 u-95% CI Rhat
## Intercept                                                           0.66 1.00
## evidence                                                            2.45 1.00
## meansTRUE                                                          -0.06 1.00
## sd_diff15                                                           1.37 1.00
## conditionHOPs                                                       0.19 1.00
## conditionintervals                                                  0.13 1.00
## conditionQDPs                                                       0.76 1.00
## start_meansTRUE                                                    -0.07 1.00
## trial                                                               1.71 1.00
## evidence:meansTRUE                                                  0.22 1.00
## evidence:sd_diff15                                                  0.92 1.00
## meansTRUE:sd_diff15                                                 1.01 1.00
## evidence:conditionHOPs                                              0.19 1.00
## evidence:conditionintervals                                         0.19 1.00
## evidence:conditionQDPs                                              0.70 1.00
## meansTRUE:conditionHOPs                                             0.48 1.00
## meansTRUE:conditionintervals                                        0.49 1.00
## meansTRUE:conditionQDPs                                             0.17 1.00
## sd_diff15:conditionHOPs                                             0.89 1.00
## sd_diff15:conditionintervals                                        0.78 1.00
## sd_diff15:conditionQDPs                                             0.51 1.00
## evidence:start_meansTRUE                                           -0.12 1.00
## meansTRUE:start_meansTRUE                                           0.94 1.00
## sd_diff15:start_meansTRUE                                           0.89 1.00
## conditionHOPs:start_meansTRUE                                       0.24 1.00
## conditionintervals:start_meansTRUE                                  0.30 1.00
## conditionQDPs:start_meansTRUE                                       0.83 1.00
## evidence:trial                                                      2.12 1.00
## conditionHOPs:trial                                                 0.57 1.00
## conditionintervals:trial                                            1.27 1.00
## conditionQDPs:trial                                                 0.95 1.00
## evidence:meansTRUE:sd_diff15                                        0.44 1.00
## evidence:meansTRUE:conditionHOPs                                    0.15 1.00
## evidence:meansTRUE:conditionintervals                               0.62 1.00
## evidence:meansTRUE:conditionQDPs                                    0.51 1.00
## evidence:sd_diff15:conditionHOPs                                    0.49 1.00
## evidence:sd_diff15:conditionintervals                               0.74 1.00
## evidence:sd_diff15:conditionQDPs                                    0.63 1.00
## meansTRUE:sd_diff15:conditionHOPs                                  -0.05 1.00
## meansTRUE:sd_diff15:conditionintervals                              1.05 1.00
## meansTRUE:sd_diff15:conditionQDPs                                   0.78 1.00
## evidence:meansTRUE:start_meansTRUE                                  0.81 1.00
## evidence:sd_diff15:start_meansTRUE                                  0.59 1.00
## meansTRUE:sd_diff15:start_meansTRUE                                 0.39 1.00
## evidence:conditionHOPs:start_meansTRUE                              0.16 1.00
## evidence:conditionintervals:start_meansTRUE                         0.66 1.00
## evidence:conditionQDPs:start_meansTRUE                              0.44 1.00
## meansTRUE:conditionHOPs:start_meansTRUE                             0.78 1.00
## meansTRUE:conditionintervals:start_meansTRUE                        0.76 1.00
## meansTRUE:conditionQDPs:start_meansTRUE                             0.62 1.00
## sd_diff15:conditionHOPs:start_meansTRUE                             0.68 1.00
## sd_diff15:conditionintervals:start_meansTRUE                        0.22 1.00
## sd_diff15:conditionQDPs:start_meansTRUE                             0.31 1.00
## evidence:conditionHOPs:trial                                        0.11 1.00
## evidence:conditionintervals:trial                                   1.25 1.00
## evidence:conditionQDPs:trial                                        0.97 1.00
## evidence:meansTRUE:sd_diff15:conditionHOPs                          0.03 1.00
## evidence:meansTRUE:sd_diff15:conditionintervals                     0.77 1.00
## evidence:meansTRUE:sd_diff15:conditionQDPs                          0.47 1.00
## evidence:meansTRUE:sd_diff15:start_meansTRUE                        0.77 1.00
## evidence:meansTRUE:conditionHOPs:start_meansTRUE                    1.11 1.00
## evidence:meansTRUE:conditionintervals:start_meansTRUE               0.60 1.00
## evidence:meansTRUE:conditionQDPs:start_meansTRUE                    0.77 1.00
## evidence:sd_diff15:conditionHOPs:start_meansTRUE                    0.62 1.00
## evidence:sd_diff15:conditionintervals:start_meansTRUE               0.16 1.00
## evidence:sd_diff15:conditionQDPs:start_meansTRUE                    0.54 1.00
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                   0.95 1.00
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE              0.39 1.00
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                   0.47 1.00
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE          0.88 1.00
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE     0.93 1.00
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE          0.61 1.00
##                                                                 Bulk_ESS
## Intercept                                                           2871
## evidence                                                            2786
## meansTRUE                                                           3987
## sd_diff15                                                           3702
## conditionHOPs                                                       2996
## conditionintervals                                                  3053
## conditionQDPs                                                       3122
## start_meansTRUE                                                     3228
## trial                                                               4422
## evidence:meansTRUE                                                  3939
## evidence:sd_diff15                                                  3157
## meansTRUE:sd_diff15                                                 4189
## evidence:conditionHOPs                                              3249
## evidence:conditionintervals                                         3086
## evidence:conditionQDPs                                              3351
## meansTRUE:conditionHOPs                                             4807
## meansTRUE:conditionintervals                                        4322
## meansTRUE:conditionQDPs                                             4868
## sd_diff15:conditionHOPs                                             3502
## sd_diff15:conditionintervals                                        4409
## sd_diff15:conditionQDPs                                             4067
## evidence:start_meansTRUE                                            3392
## meansTRUE:start_meansTRUE                                           4654
## sd_diff15:start_meansTRUE                                           4145
## conditionHOPs:start_meansTRUE                                       3588
## conditionintervals:start_meansTRUE                                  3593
## conditionQDPs:start_meansTRUE                                       3362
## evidence:trial                                                      4500
## conditionHOPs:trial                                                 4810
## conditionintervals:trial                                            4997
## conditionQDPs:trial                                                 4993
## evidence:meansTRUE:sd_diff15                                        3630
## evidence:meansTRUE:conditionHOPs                                    4796
## evidence:meansTRUE:conditionintervals                               4310
## evidence:meansTRUE:conditionQDPs                                    4424
## evidence:sd_diff15:conditionHOPs                                    4607
## evidence:sd_diff15:conditionintervals                               4113
## evidence:sd_diff15:conditionQDPs                                    4403
## meansTRUE:sd_diff15:conditionHOPs                                   4949
## meansTRUE:sd_diff15:conditionintervals                              4750
## meansTRUE:sd_diff15:conditionQDPs                                   4967
## evidence:meansTRUE:start_meansTRUE                                  4589
## evidence:sd_diff15:start_meansTRUE                                  3544
## meansTRUE:sd_diff15:start_meansTRUE                                 4248
## evidence:conditionHOPs:start_meansTRUE                              3795
## evidence:conditionintervals:start_meansTRUE                         3828
## evidence:conditionQDPs:start_meansTRUE                              3728
## meansTRUE:conditionHOPs:start_meansTRUE                             5141
## meansTRUE:conditionintervals:start_meansTRUE                        5186
## meansTRUE:conditionQDPs:start_meansTRUE                             5088
## sd_diff15:conditionHOPs:start_meansTRUE                             3921
## sd_diff15:conditionintervals:start_meansTRUE                        4202
## sd_diff15:conditionQDPs:start_meansTRUE                             4515
## evidence:conditionHOPs:trial                                        4765
## evidence:conditionintervals:trial                                   5027
## evidence:conditionQDPs:trial                                        4559
## evidence:meansTRUE:sd_diff15:conditionHOPs                          4527
## evidence:meansTRUE:sd_diff15:conditionintervals                     4891
## evidence:meansTRUE:sd_diff15:conditionQDPs                          4796
## evidence:meansTRUE:sd_diff15:start_meansTRUE                        4210
## evidence:meansTRUE:conditionHOPs:start_meansTRUE                    4561
## evidence:meansTRUE:conditionintervals:start_meansTRUE               4378
## evidence:meansTRUE:conditionQDPs:start_meansTRUE                    4901
## evidence:sd_diff15:conditionHOPs:start_meansTRUE                    4831
## evidence:sd_diff15:conditionintervals:start_meansTRUE               4245
## evidence:sd_diff15:conditionQDPs:start_meansTRUE                    4892
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                   4779
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE              4387
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                   5099
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE          4464
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE     5115
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE          4681
##                                                                 Tail_ESS
## Intercept                                                           4164
## evidence                                                            4202
## meansTRUE                                                           5148
## sd_diff15                                                           4612
## conditionHOPs                                                       3740
## conditionintervals                                                  3933
## conditionQDPs                                                       4430
## start_meansTRUE                                                     4260
## trial                                                               5024
## evidence:meansTRUE                                                  4624
## evidence:sd_diff15                                                  4407
## meansTRUE:sd_diff15                                                 4986
## evidence:conditionHOPs                                              4303
## evidence:conditionintervals                                         4268
## evidence:conditionQDPs                                              4877
## meansTRUE:conditionHOPs                                             5366
## meansTRUE:conditionintervals                                        5005
## meansTRUE:conditionQDPs                                             5242
## sd_diff15:conditionHOPs                                             4354
## sd_diff15:conditionintervals                                        5051
## sd_diff15:conditionQDPs                                             4475
## evidence:start_meansTRUE                                            3776
## meansTRUE:start_meansTRUE                                           5044
## sd_diff15:start_meansTRUE                                           5328
## conditionHOPs:start_meansTRUE                                       4562
## conditionintervals:start_meansTRUE                                  4878
## conditionQDPs:start_meansTRUE                                       4445
## evidence:trial                                                      4754
## conditionHOPs:trial                                                 5275
## conditionintervals:trial                                            5009
## conditionQDPs:trial                                                 5487
## evidence:meansTRUE:sd_diff15                                        5179
## evidence:meansTRUE:conditionHOPs                                    5121
## evidence:meansTRUE:conditionintervals                               5001
## evidence:meansTRUE:conditionQDPs                                    5168
## evidence:sd_diff15:conditionHOPs                                    5324
## evidence:sd_diff15:conditionintervals                               5088
## evidence:sd_diff15:conditionQDPs                                    4921
## meansTRUE:sd_diff15:conditionHOPs                                   5156
## meansTRUE:sd_diff15:conditionintervals                              5196
## meansTRUE:sd_diff15:conditionQDPs                                   5265
## evidence:meansTRUE:start_meansTRUE                                  5128
## evidence:sd_diff15:start_meansTRUE                                  4911
## meansTRUE:sd_diff15:start_meansTRUE                                 5010
## evidence:conditionHOPs:start_meansTRUE                              4507
## evidence:conditionintervals:start_meansTRUE                         4957
## evidence:conditionQDPs:start_meansTRUE                              4697
## meansTRUE:conditionHOPs:start_meansTRUE                             5334
## meansTRUE:conditionintervals:start_meansTRUE                        5242
## meansTRUE:conditionQDPs:start_meansTRUE                             5336
## sd_diff15:conditionHOPs:start_meansTRUE                             5153
## sd_diff15:conditionintervals:start_meansTRUE                        4665
## sd_diff15:conditionQDPs:start_meansTRUE                             4711
## evidence:conditionHOPs:trial                                        5172
## evidence:conditionintervals:trial                                   5503
## evidence:conditionQDPs:trial                                        5431
## evidence:meansTRUE:sd_diff15:conditionHOPs                          5096
## evidence:meansTRUE:sd_diff15:conditionintervals                     4985
## evidence:meansTRUE:sd_diff15:conditionQDPs                          5409
## evidence:meansTRUE:sd_diff15:start_meansTRUE                        4878
## evidence:meansTRUE:conditionHOPs:start_meansTRUE                    5050
## evidence:meansTRUE:conditionintervals:start_meansTRUE               4924
## evidence:meansTRUE:conditionQDPs:start_meansTRUE                    5110
## evidence:sd_diff15:conditionHOPs:start_meansTRUE                    5468
## evidence:sd_diff15:conditionintervals:start_meansTRUE               5299
## evidence:sd_diff15:conditionQDPs:start_meansTRUE                    5508
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                   5257
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE              5021
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                   5457
## evidence:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE          5161
## evidence:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE     5433
## evidence:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE          5510
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Our research questions are about the points of subjective equality (PSE) and just-noticable differences (JND) for this logistic regression model. We derive estimates of these two statistics from the model’s posterior distribution.

# get slopes from linear model
slopes_df <- model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(evidence = c(0, 1)) %>%
  add_fitted_draws(m.decisions, re_formula = NA, scale = "linear", seed = 1234) %>%
  compare_levels(.value, by = evidence) %>%
  rename(slope = .value)

# get intercepts from linear model
intercepts_df <- model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(evidence = 0) %>%
  add_fitted_draws(m.decisions, re_formula = NA, scale = "linear", seed = 1234) %>%
  rename(intercept = .value) 

# join dataframes for slopes and intercepts, calculate PSE and JND
stats_df <- slopes_df %>% 
  full_join(intercepts_df, by = c("means", "sd_diff", "condition", "trial", "start_means", ".draw")) %>%
  mutate(
    # evidence units
    pse = -intercept / slope,
    jnd = qlogis(0.75) / slope,
    # probabilities of winning with the new player
    pse_p_award = exp(pse) / (1 / (unique(model_df$baseline) + 1 / unique(model_df$award_value)) - 1 + exp(pse)) - unique(model_df$baseline) - 1 / unique(model_df$award_value),
    jnd_p_award = exp(jnd) / (1 / (unique(model_df$baseline) + 1 / unique(model_df$award_value)) - 1 + exp(jnd)) - unique(model_df$baseline) - 1 / unique(model_df$award_value)
  )

Points of Subjective Equality (PSE)

PSE describe a chart user’s bias toward or against intervening compared to utility optimal decision criterion on the evidence scale (a proxy for effect size which is described in the paper).

Effects of Adding Means

Let’s take a look at the interaction effects on PSE of adding means at difference levels of variance.

stats_df %>%
  group_by(means, sd_diff, condition, .draw) %>%          # maginalize out other manipulations
  summarise(pse = mean(pse)) %>%
  ggplot(aes(x = pse, y = condition, group = means, fill = means)) +
  stat_slabh(alpha = 0.35) +
  labs(subtitle = "PSE Interaction") +
  theme_minimal() +
  facet_grid(sd_diff ~ .)

Let’s look at contrasts for the impact of the mean.

stats_df %>%
  group_by(means, sd_diff, condition, .draw) %>%          # maginalize out other manipulations
  summarise(pse = mean(pse)) %>%
  compare_levels(pse, by = means) %>%
  ggplot(aes(x = pse, y = condition)) +
  stat_halfeyeh() +
  labs(subtitle = "Difference in PSE (Means present - absent)") +
  theme_minimal() +
  facet_grid(sd_diff ~ .)

In terms of the direction of effect, extrinsic means seem to consistently bias PSE toward intervention at high variance and away from intervention at low variance. This has the impact of exacerbating biases in decisions compared to when means are absent (with the exception of quantile dotplots at low variance). However, these effect of adding the mean only appear to be reliable for quantile dotplots at low variance and for intervals and maybe densities at high variance. We suspect that more data would shrink the uncertainty in these estimates revealing this to be persistent trend.

Quantile dotplots are slightly different than other charts in that they are the only uncertainty encoding that consistently biases users toward intervention, regardless of the level of variance. This means that the positive impact on PSE induced by adding means at low variance is debiasing for quantile doplots, which is the only case where we can say that adding means is reliably helpful for decision-making.

Effect of means on PSE for each combination of visualization condition and level of variance (in figure).

pse_tbl <- stats_df %>%
  group_by(means, sd_diff, condition, .draw) %>%          # maginalize out other manipulations
  summarise(pse = mean(pse)) %>%
  compare_levels(pse, by = means) %>%
  mean_qi()
pse_p_tbl <- stats_df %>%
  group_by(means, sd_diff, condition, .draw) %>%          # maginalize out other manipulations
  summarise(pse_p_award = mean(pse_p_award)) %>%
  compare_levels(pse_p_award, by = means) %>%
  mean_qi()
pse_tbl %>% full_join(pse_p_tbl, by = c("means", "sd_diff", "condition"))
## # A tibble: 8 x 15
## # Groups:   means, sd_diff [2]
##   means sd_diff condition     pse .lower.x .upper.x .width.x .point.x
##   <fct> <fct>   <fct>       <dbl>    <dbl>    <dbl>    <dbl> <chr>   
## 1 TRUE… 5       densities  0.133   -0.0299  0.314       0.95 mean    
## 2 TRUE… 5       intervals  0.125   -0.144   0.433       0.95 mean    
## 3 TRUE… 5       HOPs       0.0908  -0.160   0.353       0.95 mean    
## 4 TRUE… 5       QDPs       0.252    0.0816  0.435       0.95 mean    
## 5 TRUE… 15      densities -0.116   -0.238   0.00174     0.95 mean    
## 6 TRUE… 15      intervals -0.160   -0.277  -0.0400      0.95 mean    
## 7 TRUE… 15      HOPs      -0.0984  -0.267   0.0603      0.95 mean    
## 8 TRUE… 15      QDPs      -0.0429  -0.156   0.0727      0.95 mean    
## # … with 7 more variables: .interval.x <chr>, pse_p_award <dbl>,
## #   .lower.y <dbl>, .upper.y <dbl>, .width.y <dbl>, .point.y <chr>,
## #   .interval.y <chr>

PSE with and without means added for each combination of visualization condition and level of variance. These numbers help us explain the nuanced differences in PSE between visualization designs in the paper.

stats_df %>%
  group_by(means, sd_diff, condition, .draw) %>%          # maginalize out other manipulations
  summarise(
    pse = mean(pse),
    pse_p_award = mean(pse_p_award)
  ) %>%
  mean_qi()
## # A tibble: 16 x 12
## # Groups:   means, sd_diff [4]
##    means sd_diff condition     pse pse.lower pse.upper pse_p_award
##    <fct> <fct>   <fct>       <dbl>     <dbl>     <dbl>       <dbl>
##  1 FALSE 5       densities -0.0335   -0.188     0.127     -0.00611
##  2 FALSE 5       intervals  0.292     0.0775    0.548      0.0350 
##  3 FALSE 5       HOPs       0.255     0.0210    0.531      0.0324 
##  4 FALSE 5       QDPs      -0.213    -0.360    -0.0637    -0.0356 
##  5 FALSE 15      densities -0.530    -0.643    -0.418     -0.0936 
##  6 FALSE 15      intervals -0.423    -0.541    -0.302     -0.0727 
##  7 FALSE 15      HOPs      -0.625    -0.759    -0.496     -0.113  
##  8 FALSE 15      QDPs      -0.573    -0.676    -0.470     -0.102  
##  9 TRUE  5       densities  0.0994   -0.0969    0.323      0.0110 
## 10 TRUE  5       intervals  0.417     0.158     0.748      0.0420 
## 11 TRUE  5       HOPs       0.346     0.0719    0.674      0.0388 
## 12 TRUE  5       QDPs       0.0395   -0.137     0.242      0.00339
## 13 TRUE  15      densities -0.646    -0.770    -0.524     -0.117  
## 14 TRUE  15      intervals -0.583    -0.693    -0.477     -0.105  
## 15 TRUE  15      HOPs      -0.723    -0.888    -0.562     -0.134  
## 16 TRUE  15      QDPs      -0.616    -0.729    -0.501     -0.111  
## # … with 5 more variables: pse_p_award.lower <dbl>, pse_p_award.upper <dbl>,
## #   .width <dbl>, .point <chr>, .interval <chr>

Effect of means on PSE, marginalizing across visualization condition (in figure).

pse_tbl <- stats_df %>%
  group_by(means, sd_diff, .draw) %>% 
  summarise(pse = mean(pse)) %>%
  compare_levels(pse, by = means) %>%
  mean_qi()
pse_p_tbl <- stats_df %>%
  group_by(means, sd_diff, .draw) %>% 
  summarise(pse_p_award = mean(pse_p_award)) %>%
  compare_levels(pse_p_award, by = means) %>%
  mean_qi()
pse_tbl %>% full_join(pse_p_tbl, by = c("means", "sd_diff"))
## # A tibble: 2 x 14
## # Groups:   means [1]
##   means sd_diff    pse .lower.x .upper.x .width.x .point.x .interval.x
##   <fct> <fct>    <dbl>    <dbl>    <dbl>    <dbl> <chr>    <chr>      
## 1 TRUE… 5        0.150   0.0271   0.281      0.95 mean     qi         
## 2 TRUE… 15      -0.105  -0.175   -0.0358     0.95 mean     qi         
## # … with 6 more variables: pse_p_award <dbl>, .lower.y <dbl>, .upper.y <dbl>,
## #   .width.y <dbl>, .point.y <chr>, .interval.y <chr>

PSE with and without means, marginalizing across visualization conditions. These numbers help us explain the aggregate effect of adding means on decision quality at each level of variance.

stats_df %>%
  group_by(means, sd_diff, .draw) %>%          # maginalize out other manipulations
  summarise(
    pse = mean(pse),
    pse_p_award = mean(pse_p_award)
  ) %>%
  mean_qi()
## # A tibble: 4 x 11
## # Groups:   means [2]
##   means sd_diff     pse pse.lower pse.upper pse_p_award pse_p_award.low…
##   <fct> <fct>     <dbl>     <dbl>     <dbl>       <dbl>            <dbl>
## 1 FALSE 5        0.0750   -0.0344     0.197     0.00642         -0.00849
## 2 FALSE 15      -0.538    -0.599     -0.476    -0.0954          -0.108  
## 3 TRUE  5        0.225     0.0936     0.366     0.0238           0.00821
## 4 TRUE  15      -0.642    -0.712     -0.575    -0.117           -0.132  
## # … with 4 more variables: pse_p_award.upper <dbl>, .width <dbl>, .point <chr>,
## #   .interval <chr>

Let’s visualize the effect of means on PSE, marginalizing across visualization conditions, since this is particularly important and isn’t shown clearly above.

stats_df %>%
  group_by(means, sd_diff, .draw) %>%          # maginalize out other manipulations
  summarise(pse = mean(pse)) %>%
  compare_levels(pse, by = means) %>%
  ggplot(aes(x = pse, y = sd_diff)) +
  stat_halfeyeh() +
  labs(subtitle = "Difference in PSE (Means present - absent)") +
  theme_minimal()

It looks like the effect of means is reliable if we marginalize across visualization conditions, which lends credence to the argument that this effect is robust.

Visualization Effects

We preregistered comparisons between estimates of PSE per visualization, marginalizing across other manipulations. However, it occurs to us in hindsight that this marginalization corresponds to a visualization designers cannot render, a chart both with and without means at the same time. Therefore, we omit these comparisons from the paper and present them only in supplemental materials.

stats_df %>%
  group_by(condition, .draw) %>%          # maginalize out other manipulations
  summarise(pse = mean(pse)) %>%
  ggplot(aes(x = pse, y = condition, fill = condition)) +
  stat_slabh(alpha = 0.35) +
  scale_fill_brewer(type = "qual", palette = 2) + 
  labs(subtitle = "PSE Per Visualization Condition") +
  theme_minimal() +
  theme(legend.position = "none")

stats_df %>%
  group_by(condition, .draw) %>%         
  summarise(
    pse = mean(pse),
    pse_p_award = mean(pse_p_award)
  ) %>%
  mean_qi()
## # A tibble: 4 x 10
##   condition     pse pse.lower pse.upper pse_p_award pse_p_award.low…
##   <fct>       <dbl>     <dbl>     <dbl>       <dbl>            <dbl>
## 1 densities -0.277     -0.392   -0.153      -0.0515          -0.0711
## 2 intervals -0.0746    -0.213    0.0795     -0.0252          -0.0441
## 3 HOPs      -0.187     -0.342   -0.0135     -0.0440          -0.0684
## 4 QDPs      -0.340     -0.447   -0.228      -0.0614          -0.0804
## # … with 4 more variables: pse_p_award.upper <dbl>, .width <dbl>, .point <chr>,
## #   .interval <chr>

Let’s look at contrasts between visualization conditions for visual reliability tests.

stats_df %>%
  group_by(condition, .draw) %>%          # maginalize out other manipulations
  summarise(pse = mean(pse)) %>%
  compare_levels(pse, by = condition) %>%
  ggplot(aes(x = pse, y = condition)) +
  stat_halfeyeh() +
  labs(subtitle = "Differences in PSE Between Visualization Conditions") +
  theme_minimal()

It looks like the point of subjective equality is least biased with intervals, with increasing bias toward intervening (i.e., negative PSE) with HOPs, densities, and quantile dotplots, respectively. Only pairwise differences of intervals minus densities and intervals minus quantile dotplots are reliable.

Effect of Visualization Design for Low and High Variance Separately

It seems like the patterns of results for PSE at low vs high variance are different enough that we might want to make different design recommendations depending on the level of variance shown in charts. For this reason, in the paper we present contrasts between visualization designs at low and high variance separately.

Let’s start by looking at contrasts between visualization designs at low variance.

stats_df %>%
  filter(sd_diff == 5) %>%
  unite(design, c("condition", "means")) %>%
  group_by(design, .draw) %>%                   # group by predictors to keep
  summarise(pse = mean(pse)) %>%                # marginalize by taking a weighted average
  compare_levels(pse, by = design) %>%
  ggplot(aes(x = pse, y = design)) +
  stat_halfeyeh() +
  labs(subtitle = "Differences in PSE Between Visualization Designs at Low Variance") +
  theme_minimal()

What we can take away from this figure that we didn’t get from the chart of interaction effects above is that there are no reliable differences among visualization designs that use intervals and HOPs for uncertainty encodings. This is true for densities and quantile dotplots as well, with the exception of the comparison between quantile dotplots without means and densities with means, two designs with opposite directions of bias. Densities without means and quantile dotplots with means are the least biased conditions and are reliably different from designs that use intervals and HOPs to encode uncertainty.

Just to reiterate, this particularly important comparsion, there is no reliable difference between densities without means and quantile dotplots with means.

stats_df %>%
  group_by(means, sd_diff, condition, .draw) %>%          # maginalize out other manipulations
  summarise(pse = mean(pse)) %>%
  filter(sd_diff == 5) %>%
  unite(vis_cond, condition, means) %>%
  filter(vis_cond %in% c("densities_FALSE", "QDPs_TRUE")) %>%
  compare_levels(pse, by = vis_cond) %>%
  ggplot(aes(x = pse, y = vis_cond)) +
  stat_halfeyeh() +
  labs(subtitle = "Differences in PSE Densities and QDPs at Low Variance") +
  theme_minimal()

Now, we’ll consider contrasts between visualization designs high variance.

stats_df %>%
  filter(sd_diff == 15) %>%
  unite(design, c("condition", "means")) %>%
  group_by(design, .draw) %>%                   # group by predictors to keep
  summarise(pse = mean(pse)) %>%                # marginalize by taking a weighted average
  compare_levels(pse, by = design) %>%
  ggplot(aes(x = pse, y = design)) +
  stat_halfeyeh() +
  labs(subtitle = "Differences in PSE Between Visualization Designs at High Variance") +
  theme_minimal()

When we look for the least biased distributional encoding at high variance, intervals without means stand out. However, they are not reliably less biased than intervals without means.

Effect of High vs Low Variance

Now, let’s look at constrasts for the impact of the level of variance. This is an exploratory comparison.

stats_df %>%
  group_by(sd_diff, .draw) %>%          # maginalize out other manipulations (including means present/absent)
  summarise(pse = mean(jnd)) %>%
  compare_levels(pse, by = sd_diff) %>%
  ggplot(aes(x = pse, y = "Effect of Variance")) +
  stat_halfeyeh() +
  labs(x = "Difference in PSE (High - Low Variance)") +
  theme_minimal()

People seem to intervene more than they should when uncertainty is high. It may be that users err on the side of caution in decision-making when the span of distributions is larger compared to the width of the axis. This was not really our primary research question, but it is an interesting result that future work should probably investigate further.

Just-Noticeable Differences (JNDs)

JNDs describe a chart user’s sensitivity to effect size information (i.e., evidence) for the purpose of making decisions.

Effects of Adding Means

Since we are interested in the way that extinsic means impact the perception of effect size at difference levels of variance, we look at how this effect manifests in JNDs.

stats_df %>%
  group_by(means, sd_diff, condition, .draw) %>%          # maginalize out other manipulations
  summarise(jnd = mean(jnd)) %>%
  ggplot(aes(x = jnd, y = condition, group = means, fill = means)) +
  stat_slabh(alpha = 0.35) +
  labs(subtitle = "JND Interaction") +
  theme_minimal() +
  facet_grid(sd_diff ~ .)

Let’s look at contrasts for the impact of the mean.

stats_df %>%
  group_by(means, sd_diff, condition, .draw) %>%          # maginalize out other manipulations
  summarise(jnd = mean(jnd)) %>%
  compare_levels(jnd, by = means) %>%
  ggplot(aes(x = jnd, y = condition)) +
  stat_halfeyeh() +
  labs(x = "JND Difference (Means present - absent)") +
  theme_minimal() +
  facet_grid(sd_diff ~ .)

Adding means seem to improve sensitivity for intervals at high variance. All other effects are not reliable.

Effect of means on JNDs for each combination of visualization condition and level of variance (in figure).

jnd_tbl <- stats_df %>%
  group_by(means, sd_diff, condition, .draw) %>%          # maginalize out other manipulations
  summarise(jnd = mean(jnd)) %>%
  compare_levels(jnd, by = means) %>%
  mean_qi()
jnd_p_tbl <- stats_df %>%
  group_by(means, sd_diff, condition, .draw) %>%          # maginalize out other manipulations
  summarise(jnd_p_award = mean(jnd_p_award)) %>%
  compare_levels(jnd_p_award, by = means) %>%
  mean_qi()
jnd_tbl %>% full_join(jnd_p_tbl, by = c("means", "sd_diff", "condition"))
## # A tibble: 8 x 15
## # Groups:   means, sd_diff [2]
##   means sd_diff condition     jnd .lower.x .upper.x .width.x .point.x
##   <fct> <fct>   <fct>       <dbl>    <dbl>    <dbl>    <dbl> <chr>   
## 1 TRUE… 5       densities  0.0133  -0.0846   0.125      0.95 mean    
## 2 TRUE… 5       intervals -0.0547  -0.216    0.116      0.95 mean    
## 3 TRUE… 5       HOPs      -0.0139  -0.158    0.139      0.95 mean    
## 4 TRUE… 5       QDPs      -0.0188  -0.108    0.0732     0.95 mean    
## 5 TRUE… 15      densities -0.0409  -0.103    0.0196     0.95 mean    
## 6 TRUE… 15      intervals -0.105   -0.169   -0.0486     0.95 mean    
## 7 TRUE… 15      HOPs       0.0175  -0.0676   0.110      0.95 mean    
## 8 TRUE… 15      QDPs      -0.0320  -0.0828   0.0214     0.95 mean    
## # … with 7 more variables: .interval.x <chr>, jnd_p_award <dbl>,
## #   .lower.y <dbl>, .upper.y <dbl>, .width.y <dbl>, .point.y <chr>,
## #   .interval.y <chr>

Effect of means on JNDs, marginalizing across visualization condition. The effect of adding means is not reliable at either low or high variance in the aggregate (omitted from figure for space).

jnd_tbl <- stats_df %>%
  group_by(means, sd_diff, .draw) %>%
  summarise(jnd = mean(jnd)) %>%
  compare_levels(jnd, by = means) %>%
  mean_qi()
jnd_p_tbl <- stats_df %>%
  group_by(means, sd_diff, .draw) %>%
  summarise(jnd_p_award = mean(jnd_p_award)) %>%
  compare_levels(jnd_p_award, by = means) %>%
  mean_qi()
jnd_tbl %>% full_join(jnd_p_tbl, by = c("means", "sd_diff"))
## # A tibble: 2 x 14
## # Groups:   means [1]
##   means sd_diff     jnd .lower.x .upper.x .width.x .point.x .interval.x
##   <fct> <fct>     <dbl>    <dbl>    <dbl>    <dbl> <chr>    <chr>      
## 1 TRUE… 5       -0.0185  -0.0958  0.0576      0.95 mean     qi         
## 2 TRUE… 15      -0.0402  -0.0843  0.00351     0.95 mean     qi         
## # … with 6 more variables: jnd_p_award <dbl>, .lower.y <dbl>, .upper.y <dbl>,
## #   .width.y <dbl>, .point.y <chr>, .interval.y <chr>

Visualization Effects

We preregistered comparisons of JNDs per visualization, marginalizing across other manipulations. However, it occurs to us in hindsight that this marginalization corresponds to a visualization designers cannot render, a chart both with and without means at the same time. Therefore, we omit these comparisons from the paper and present them only in supplemental materials.

stats_df %>%
  group_by(condition, .draw) %>%          # maginalize out other manipulations
  summarise(jnd = mean(jnd)) %>%
  ggplot(aes(x = jnd, y = condition, fill = condition)) +
  stat_slabh(alpha = 0.35) +
  scale_fill_brewer(type = "qual", palette = 2) + 
  labs(subtitle = "JND Per Visualization Condition") +
  theme_minimal() +
  theme(legend.position = "none")

stats_df %>%
  group_by(condition, .draw) %>%         
  summarise(
    jnd = mean(jnd),
    jnd_p_award = mean(jnd_p_award)
  ) %>%
  mean_qi()
## # A tibble: 4 x 10
##   condition   jnd jnd.lower jnd.upper jnd_p_award jnd_p_award.low…
##   <fct>     <dbl>     <dbl>     <dbl>       <dbl>            <dbl>
## 1 densities 0.502     0.447     0.567      0.0631           0.0575
## 2 intervals 0.520     0.458     0.598      0.0635           0.0579
## 3 HOPs      0.599     0.520     0.696      0.0727           0.0653
## 4 QDPs      0.431     0.388     0.481      0.0555           0.0510
## # … with 4 more variables: jnd_p_award.upper <dbl>, .width <dbl>, .point <chr>,
## #   .interval <chr>

Let’s look at contrasts between visualization conditions for visual reliability tests.

stats_df %>%
  group_by(condition, .draw) %>%          # maginalize out other manipulations
  summarise(jnd = mean(jnd)) %>%
  compare_levels(jnd, by = condition) %>%
  ggplot(aes(x = jnd, y = condition)) +
  stat_halfeyeh() +
  labs(x = "Differences in JNDs Between Visualization Conditions") +
  theme_minimal()

It looks like users are most sensitive to evidence (i.e., JNDs are smaller) in the quantile dotplots condition and are least sensitive with HOPs. Only the difference between quantile dotplots and other conditions is reliable.

Effect of Visualization Design when Averaging Over Variance

In the paper, we look at the JNDs for different visualization designs when we average over variance to give a sense of overall effectiveness on this metric.

stats_df %>%
  group_by(condition, means, .draw) %>%          # maginalize out other manipulations
  summarise(jnd = mean(jnd)) %>%
  ggplot(aes(x = jnd, y = condition, group = means, fill = means)) +
  stat_slabh(alpha = 0.35) +
  theme_minimal() +
  theme(legend.position = "none")

Let’s look at contrasts between each of these designs.

stats_df %>%
  unite(design, c("condition", "means")) %>%
  group_by(design, .draw) %>%                   # group by predictors to keep
  summarise(jnd = mean(jnd)) %>%                # marginalize by taking a weighted average
  compare_levels(jnd, by = design) %>%
  ggplot(aes(x = jnd, y = design)) +
  stat_halfeyeh() +
  labs(subtitle = "Differences in JND Between Visualization Designs, Averaging Over Variance") +
  theme_minimal()

We can see that quantile dotplots with or without means have reliably smaller JNDs than other conditions, with the exception of the contrast between quantile dotplots without means and densities with or without means. These are the only reliably differences between designs.

Effect of means on JNDs for each visualization condition, marginalizing over variance (in figure).

jnd_tbl <- stats_df %>%
  group_by(condition, means, .draw) %>%          # maginalize out other manipulations
  summarise(jnd = mean(jnd)) %>%
  compare_levels(jnd, by = means) %>%
  mean_qi()
jnd_p_tbl <- stats_df %>%
  group_by(condition, means, .draw) %>%          # maginalize out other manipulations
  summarise(jnd_p_award = mean(jnd_p_award)) %>%
  compare_levels(jnd_p_award, by = means) %>%
  mean_qi()
jnd_tbl %>% full_join(jnd_p_tbl, by = c("condition", "means"))
## # A tibble: 4 x 14
## # Groups:   condition [4]
##   condition means      jnd .lower.x .upper.x .width.x .point.x .interval.x
##   <fct>     <fct>    <dbl>    <dbl>    <dbl>    <dbl> <chr>    <chr>      
## 1 densities TRUE… -0.0138   -0.0802   0.0594     0.95 mean     qi         
## 2 intervals TRUE… -0.0800   -0.177    0.0141     0.95 mean     qi         
## 3 HOPs      TRUE…  0.00180  -0.0921   0.104      0.95 mean     qi         
## 4 QDPs      TRUE… -0.0254   -0.0825   0.0336     0.95 mean     qi         
## # … with 6 more variables: jnd_p_award <dbl>, .lower.y <dbl>, .upper.y <dbl>,
## #   .width.y <dbl>, .point.y <chr>, .interval.y <chr>

JNDs with and without means, marginalizing over variance. These numbers help us to contextualize the overall effect of visualization designs on JNDs.

stats_df %>%
  group_by(condition, means, .draw) %>%          # maginalize out other manipulations
  summarise(
    jnd = mean(jnd),
    jnd_p_award = mean(jnd_p_award)
  ) %>%
  mean_qi()
## # A tibble: 8 x 11
## # Groups:   condition [4]
##   condition means   jnd jnd.lower jnd.upper jnd_p_award jnd_p_award.low…
##   <fct>     <fct> <dbl>     <dbl>     <dbl>       <dbl>            <dbl>
## 1 densities FALSE 0.509     0.453     0.576      0.0641           0.0583
## 2 densities TRUE  0.495     0.430     0.578      0.0620           0.0556
## 3 intervals FALSE 0.559     0.486     0.654      0.0679           0.0612
## 4 intervals TRUE  0.480     0.410     0.575      0.0592           0.0528
## 5 HOPs      FALSE 0.598     0.517     0.700      0.0727           0.0650
## 6 HOPs      TRUE  0.600     0.505     0.718      0.0727           0.0637
## 7 QDPs      FALSE 0.443     0.395     0.500      0.0571           0.0519
## 8 QDPs      TRUE  0.418     0.366     0.481      0.0540           0.0484
## # … with 4 more variables: jnd_p_award.upper <dbl>, .width <dbl>, .point <chr>,
## #   .interval <chr>

Effect of High vs Low Variance

Now, let’s look at constrasts for the impact of the level of variance. This is an exploratory comparison.

stats_df %>%
  group_by(sd_diff, .draw) %>%          # maginalize out other manipulations (including means present/absent and vis condition)
  summarise(jnd = mean(jnd)) %>%
  compare_levels(jnd, by = sd_diff) %>%
  ggplot(aes(x = jnd, y = "effect of variance")) +
  stat_halfeyeh() +
  labs(x = "JND Difference (Variance high - low)") +
  theme_minimal()

Users seem to be consistently more sensitive to evidence (smaller JNDs) when uncertainty is high. This might be because charts in the high uncertainty condition use more of the space on a chart to convey effect size compared to the low uncertainty charts which have a lot of white space such that smaller visual differences convey the same effect size.

Does Perceptual Accuracy Lead to Better Decision-Making?

We want to explore how perceptual bias as measured by LLO slopes impacts decision quality as measured by JND and PSE. To do this, we derive point estimates of estimates LLO slope, JND, and PSE for each worker in our data set and combine these statistics into one dataframe.

# get linear log odds (LLO) slopes per worker
wrkr_llo_slopes_df <- model_df %>%
  group_by(worker_id, means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.p_sup, n = 500) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(llo_slope = .value) %>%
  group_by(worker_id, condition) %>%                # calculate point estimate of marginal LLO slope per worker
  summarise(llo_slope = weighted.mean(llo_slope))
# get logistic regression slopes per worker
wrkr_logistic_slopes_df <- model_df %>%
  group_by(worker_id, means, sd_diff, condition, trial, start_means) %>%
  data_grid(evidence = c(0, 1)) %>%
  add_fitted_draws(m.decisions, scale = "linear", n = 500, seed = 1234) %>%
  compare_levels(.value, by = evidence) %>%
  rename(slope = .value)

# get logistic regression intercepts per worker
wrkr_logistic_intercepts_df <- model_df %>%
  group_by(worker_id ,means, sd_diff, condition, trial, start_means) %>%
  data_grid(evidence = 0) %>%
  add_fitted_draws(m.decisions, scale = "linear", n = 500, seed = 1234) %>%
  rename(intercept = .value) 

# join dataframes for logistic slopes and intercepts, calculate PSE and JND
wrkr_logistic_stats_df <- wrkr_logistic_slopes_df %>% 
  full_join(wrkr_logistic_intercepts_df, by = c("worker_id", "means", "sd_diff", "condition", "trial", "start_means", ".draw")) %>%
  mutate(
    pse = -intercept / slope,
    jnd = qlogis(0.75) / slope
  ) %>%
  group_by(worker_id, condition) %>%  # calculate point estimate of marginal JND and PSE per worker
  summarise(
    pse = weighted.mean(pse),
    jnd = weighted.mean(jnd)
  )
# join the dataframes of summary statistics per worker
wrkr_stats_df <- wrkr_llo_slopes_df %>%
  full_join(wrkr_logistic_stats_df, by = c("worker_id", "condition"))

Prior work (Khaw et al., cited in paper) explained bias in PSE in terms of sensitivity to signal as measured by JND. Let’s plot these things together to see if we have a similar correspondence in our data.

wrkr_stats_df %>%
  filter(jnd > 0) %>%
  ggplot(aes(x = jnd, y = pse)) +
  geom_point(alpha = 0.35) +
  coord_cartesian(
    xlim = c(0, 10),
    ylim = c(-20, 20)
  ) +
  theme_minimal()

We see here that PSE closer to zero are predicted by JNDs closer to zero. That is workers who have greater sensitivity to effect size for the purpose of decision-making also tend to make more utility-optimal decisions.

However, since we have a separate task which gauges bias in the perception of effect size, we can also look at how performance on the estimation task predicts performance on the decision task.

Now let’s look at the relationship between LLO slopes and JNDs. This should give a rough indication of how much perceptual accuracy for effect size judgments translates into sensitivity to effect size information for the purpose of decision-making. We’ve had to filter some workers with extreme JNDs out of this view to get a chart we can read. These are the subset of workers with JND estimates in a reasonable range.

wrkr_stats_df %>%
  filter(jnd > 0) %>%
  ggplot(aes(x = llo_slope, y = jnd)) +
  geom_point(alpha = 0.35) +
  coord_cartesian(ylim = c(0, 10)) +
  theme_minimal()

We can see that while more of the high JNDs (indicating insensitivity) are for workers with low LLO slopes (indicating a tendency to underestimate effect size). However, most workers have relatively small JNDs across the full range of observed LLO slopes, suggesting that perceptual accuracy and sensitivity are only loosely linked with additional factors probably impacting decision-making.

What about the relationship between LLO slopes and PSE. This should give a rough sense of how much perceptual bias translates into bias in decision-making. Again, we’ve had to filter some workers with extreme PSE out of this view to get a chart we can read.

wrkr_stats_df %>%
  ggplot(aes(x = llo_slope, y = pse)) +
  geom_point(alpha = 0.35) +
  coord_cartesian(ylim = c(-20, 20)) +
  theme_minimal()

Here again we see that the most extreme biases in decision-making (PSE far from 0) tend to correspond with the most extreme tendency to underestimate effect size (slopes less than 1). While biases in decision-making are less common among users with more accurate effect size judgments, the opposite is not the case: There are many users with poor perceptual accuracy who have close to utility optimal decisions. This suggests that perceptual accuracy does not determine a user’s ability to make a decision. The implication for the visualization community is that we need to seek a better understanding of how performance on these tasks is related.

Part of this mismatch between perceptual performance and decision-making performance may be explained by the fact that out magnitude estimation task was more difficult than the decision task. Some users struggled with the more granular response scale of probability of superiority in pilot testing. By comparison, a binary decision is rather straightforward. We also incentivized the decision task and not the magnitude estimation task. Although we told participants that the best way to maximize their bonus was to answer both questions to the best of their ability, some participants may have sped through the probability of superiority judgments and focused on the decision task. This might explain some of the mismatch between performance on the two tasks.